










"ii 




||||i|B|i 








'B-d. ''■, ;'' 1 fe?':'S 

Viill 


iiilliiii 


4 - ' 




IW' 


ilpiiipi 








fi "i 


<NASA~CP-31^8) 

GOODARC 

CONFERENCE 

N93-30AA9 

ON NASS STORAGE 

SYSTEMS 

AND 

— THRU— 

TECHNOLOGIES, VOLUME 1 

CNASA} 

N9 3-30480 

333 p 



Unci as 




Ml/82 0159090 


I .. ' It- ’ 15;:'' 


%'d^S'K'>d' K-j' -^"' \ ''' ' 'd, ,„ . 
fc^VPvPlP / p'''' ; '''/ 

i'":55vP ,;'P | 




^ ^ ///^^ 

t . 








i"" 

4 pH 






" ' « V 


■ f 


jJSSF 


♦ f »y *i^ «-y 4 i 4 » 


« ^ 5 , 




V 


- 551 ' 


' '' 
/r 5 ♦ 




'c^:: 




■^ ' r '» ■»»• 

- - iM»> Iiiiiill[|||||;||i _ ■ 

■ .- ■ V ^rjr'O 






\-": 

\ 






NASA Conference Publication 3198^ Vol. / 


Goddard Conference on 
Mass Storage Systems 
and Technologies 

Volume I 


Edited by 
Ben Kobicr 
Godderd Space Flight Center 
GreenbelU Maryland 

P. C. Hariharan 
STX Corptmxtion 
Lanham, Maryland 


PoKcedings of a confcanKC held at 
NASA Gixidard Space Right Center 
Gavnheir Mainland 
Scpiemher 22-24, IW2 


iw^ 

National Aeronautics and 
Space Administration 

Office of Management 

Scientific and Technical 
Information Program 

1993 



Goddard Conference on Mass Storage Systems 
and Technologies 


Program Committee 

Ben Kobier. NASA/GSFC (Chair) 

John Berbert. NASA/GSFC 

Wiian A CaflkxKt. NOAA/TJESDiS 

Sam Coleman, Lawrrenoe Livermore National L^x>ratones 

Susan Hauser. National Library of MecSdne 

Sar^y Ranade, Infotech SA. Inc 

Eizabeth Wiliams. SupercompuUng Research Center 

Jean-Jacques Bedet, Hughes STX 

Alan Dwyer, Hughes STX 

P C Haiftiaran, Hughes STX 


Conference Coordinator 

Nidd Fritz. Hughes STX 


Production and Layout 

Len Blasso, Hughes STX 
Ann M. Lipscomb, Hughes STX 


ii 



PREFACE 


f 


Papers presented at the Goddard Conference on Mass Storage Systems and Technologies, that 
were submitted for publication in advance of the Conference appear in volume 1 of these 
Proceedings. Volume 2 contains additional papers and view graphs which were made ava ilable 
at the time of the Conference, as well as reports of the keynote address, the after-dinner s])eech. 
and the two panel discussions. We are grateful to all the authors for their contributions. 

Dr. David Nelson. Director of the Office of Sclentlftc Computing. De^ irtment of E^iergy. opened 
the conference with a keynote address that began by identi^lng projects and activities that 
are. or will be. generating massive volumes of data. Some of tire giand challenge problems of 
the High Performance Computing and CommunicaUons initiative are likely to rival, or even 
surpass, the Elarth Observing System in the amount of data they create. Managing such large 
archives is itself likely *o prove a grand challenge. He referr^ to inaccessible data as the 
“iandfiR ctf cyberspace." Learning to answer unanticipated questions, revising data stnictures 
as requirements evolve, doing this in a cost-effective and practical manner in a hierarchical 
storage system, and dealing with distributed data bases that are networked tt^ether will tax 
both human Ingenuity and resources. 

Mass storage systems have now truly begun to be massive, with data ingestion rates 
approaching terabytes per day. At the same time, the identifiable unit for processing purposes 
(file, granule, dataset or some sii'iUar object], has also increased in size, and could b^tn to pose 
a challenge to traditional file syuten^s that impose limits on both the size of the objects, and 
tne number of ol^ects in the file system. EVen the casual user needs more than the object name, 
the size and date of the creatien of the object, and the limited metadata provided with classical 
directory systems. Some of these issues are addressed by the IEEE Mass Storage System 
Reference Model (MSS RM). which Is seeking to provide a framework In which hardware and 
software from different vendors can act cooperatively and harmoniously to store, manage and 
distribute data. Dr. Sam Coleman of the Lawrence Livermore National Laboratory and Mr. Bob 
Coyne of the IBM Federal Sector Division discussed the history and current status of the 
Reference Model. Version 5 of the MSS RM will appear in April 1993 as a Recommended 
Practice instead of as a Guide. The emphasis of the Storage Systems Standards Working Group 
(SSS WG) is focused on decomposing storage systems into Interoperable functional modules 
which vendors may offer as separate products, and on defining standard mterfaces through 
which clients may be provided direct access to storage systems services. Bob Coune pointed oat 
that the data managem ent, database, and file susiem develoowent and user co mmunUles are 
not represenled (n the SSS WC. and issued a olea for their active particiDatfon in th e acUviLies 
end delihsrations of the WG. Tliose interested in the SSS WG discussions may keep abreast by 
sending e-mail to ieee-sxss-request9nas.nasa.gov with the request that their name and address 
be Included in the WG redeclor. General discussions on mass storage problems are also 
published in the USENET newsgroup cmmp.arch.storage. 

Standards are essential to ensure wide availability. muKl-sourclng, and interchangeability. 
Mr. AI Dwyer, representing the NASA-OSSA Office of Standards and Technology, spoke about 
the role of this office. He was followed by Mr. Jean-Paul Emard. ANSI X3 Committee Director. 
Mr. Sam Cheatham of the X3B5 Committee, and Mr. Ken Hallam of the X3o . 1 Committee who 
discussed the ANSI standards-maklng process, the work on magnetic media standards, and the 
status of the optical media standards, respectively. 

The sheer size of the Liventoiics makes distributed systems attractive. Bob Coyne dlscus.4cd 
the National Storage Asset Laboratory at the National Energy Research Supercomputer Center 
of Ihf Department of Energy; this will be a testbed for nelw'ork attached storage devices. In this 
conliguratlon. the devices will be nodes in a network, and will provide read/wrlte services to 
authorized clients on the network wtthout the need for the data to pass through the memory of 
a computer controlling the devices. Experiences from the archives at NOAA. the National 


ill 



Space Science Data Center at NASA, the Eros Data Center of the USGS^ , and the National 
Library of Medicine were complemented by a discussion of the information manaj(ement 
challenjje posed by the Earth Observing System. Dr. Ackerman of the National Library of 
Medicine pointed out that while there is much discussion of gigabit networks and petabyte- 
sized inventories, there are still problems today in distributing much smaller flies to a user 
community not fortunate enough to be plugged Into the latest wideband network. Browsing is a 
signiHcant component of the activity at large holdings, and Dr. Ken Sialem described one way 
to hand'.c this. 

Hign volume holdings require high-performance storage devices. The Idea of using a 
Redundant Array of Inexpensive Disks to provide Increased bandwidth and reliability had 
previously been espoused by Garth Gibson, and others, and Dr. Gibson provided a simplified 
explanation of it in his tutorial lecture. A natural outgrowth of the RAID Idea Is that of RATS 
(Redundant Array of Tape Systems), and Ms. Ann Orapeau of the University of California at 
Eierkeley took up thls topic In her tutorial. 

Professor Mark Kryder, Director of the Engineering Research Center In Data Storage Systems 
at Carnegie Mellon University. Pittsburgh. PA discussed the future evolution of magnetic and 
magneto-optic storage systems In his talk on ultra-high density recording technologies. In 
cooperation with the National Storage Industry Consortium, the Center has selected the goals 
of achieving 10 Gbit/ln^ recording density In magnetic and magneto-optic disk recording, and 
1 Tblt/ln® In magnetic tape recording. 

The National Media Laboratory (NMU has been in existence since 1989. and Dr. Gary Ashton 
provided an overview of Its structure, scope and mission and reported on NML testing results of 
D-1 cassettes. A diflerent perspective, that of the system integrator, was furnished by Mr. 
Richard Lee in his talk on grand challenges In mass storage. 

Recent magnetic and optical recording technologies were described In a number of papers. 
Optical recording, traditionally available on dlste. is now possible on tape. ICI Imagedata. 
which has pioneered the concept of the digital paper, and subjected its product to one of the 
largest suite of tests, now has competition from the Dow Chemical Company and from 
Eastman Kodak. While optical storage has generally been understood to involve ablation (pit- 
forming), phase change, or alloy formation (respectively the modes of the ICl. Eiastman Kodak 
and the Dow products). Optex has a medium that uses a different technique for optical data 
storage. This involves excitation of electrons, and trapping the excited electrons in metasiable 
states on a receptor ion. The method Is Interesting and intriguing because, unlike other 
technologies. It exhibits a linear response and can therefore store more than just one bit per 
"cell. ' A panel discussion on the comparative merits of magnetic and optical storage, and their 
future, followed these papers. 

Dr. Dennis Spcllotls. a veteran in the Held of magnetic storage, was the after-dinner speaker at 
me 'Conference Banquet. He reminisced about his experiences over more than three decades In 
magnetic storage and related stories of both success and failure. His parting words were 
significant: the way to make progress Is through evolution, not revolution; the chances of 
failure when one attempts a dramatic change, a drastic departure from Ihe conventional, are 
very high, certainly in the short term; but small, evolutionary step-changes are more likely to 
succeed. 

Mr. Dale Lancaster of Convex Systems presented what the "state of the art" is in Mass Storage 
Technology. Drs. Elizabeth Williams and Tom Myers discussed the need for. and the nature of. 
the types of measurements and metrics of distributed and heterogeneous storage systems. 
Measurements were reported by Ms. Nancy Yeager of the National Center for Supercomputlng 
Applications. Mr. Bill Collins of the Los Alamos National Laboratory pnesented an overview of 
the High Performance Data System being developed there and Dr. Mill Halem. irom the NASA 

^ Although John Boyd was unable to preseni his paper "Interim Report on I^andsat National 
Archive Activities." it is ne^’erth.“less included In these pirceedings 


Iv 



Godda.*d Space Plight Center gave a critical and comparative analysis of three application- 
dependen' mass storage systems being built at Goddard. 

Mr. James F Berry, of the Department of Defense, chaired a panel discussion on High 
Performance Helical Scan Recording Systems. Representatives from Ampex. Datatape. GE. 
Sony and StoragcTek wer«^ the participants. 

The performance of the- low-end helical scan tape drives was the topic of papers by Dr. 
Chirmaswamy. formerly of Digital Ekjulpment Corporation, and by Mr. Gerry Schadegg of 
E^byte Corporation. E^byte now provides an on-line Technical Support Bulletin Board 
System (EBS). Banana Boat, as the BBS is called, can be accessed by dialing (303) 442-4323. 
xiie BBS contains information such as microcode history, technical bulletins, white papers, 
and articles of interest to 8 mm product users. Mr .Sr hndp qq advised users or 3 mm dtlt'es that 
thos e dMves were no? designed far 100% dutu cude. but onlu for 2VK to 30%. He also cautioned 
users that the small, candy size of the cassette should not lull them into thinking that the 
media does not require a controlled envli'onment for storage, shipping and operation. Finally, 
tips on reducing tile read latencies were discussed by Mr. R Hugo Patterson of Carnegie Mellon 
U»ilverslly. 

A number of posters were presented on the first day of the conference. 

Our thanks go. in addition to the authors, to the following persons and organizations: 

Dr. David Nelson. Department of Energy, the keynote speaker. 

Dr. Dennis Spellotls. the after-dinner speaker. 

the followirig session and panel discussion chairs: 

Dr. Joe King. NASA/GSFC. 

Dr. Mark Kryder. Carnegie Mellon University. 

Dr. Milt Halem. NASA/GSFC. 

Mr. James F Berry. Deijartment of Defense. 

the following members of the program committee: 

Mr. Jean-Jacques Bedet. Hughes STX Corporation. 

Mr. Bill Calllcott. NOAA, 

Dr. Sam Coleman. Lawrence Livermore National Laboratory. 

Mr. Alan M Dwyer. Hughes STX Corporation. 

Dr. Susan Hauser. Natior.al Library of Medicine. 

Dr. Sanjay Ranade. Infotech SA. Inc.. 

Dr. Elizabeth Williams, Supercomputing Research Center. 

and to: 

Ms. Nickl Fritz, the conference coordinator, 

Westover Consulting for conference arrangements. 

and Mr. Len Blassc and Ms. Ann IJpscomb for their help with the production of this document. 

We are grateful to Mr. Laurence Lueck. President of Magnetic Media Infonnatlon Services, for 
permission to reproduce the Davld-and-Gollath cover art from Volume XIII. Number 1. of the 
Magnetic Media IntemaUonal Newsletter. 


Ben Kobler. NASA/GSFC 

Jo’ ' Berben. NASA/GSFC 

P C lariharan. Hughes SIX Corporation 


V 



TABUS OF CONTENl'8 


Vohime I 


Mass Storage Syst^in Reference Model: Version 4. Sam Coletnan and Sieve MiUer. 

Lawrence UvenTt^e National Lab I "/ 

Optical Media Standards for Industry, Kennelh J. Hallam. ENDL Associates 73 

Tethiiology for National Asset Storage Systems. Robert A. Coyne and Harry Hulen, 

IBM Federal Sector Division • Houston and Richard Watson, Lawrence Livermore 77 -3 

The Visible Human Preyed of the National Library of Medicine: 

Remote Access and Distribution of a Multi-Gigabyte Data Set, Michael J. Ackerman, 

National Library of Medicine 87 “y 

Data Management in NOAA. William M. Callicoil National Oceanic and Atmospheric 
Administration 89 ' 


Interim Report on Landsat National Archive Activities. John E. Boyd, 

U. S. Geological Survey, EROS Data Center 99 ~ 


MR-CDF: Managing Multi-Resolution Sclenilflc Data. Kerjieth Salem, University of _ 

Maryland at College Park 101 “ / 

High-Performance Mass Storage System for Workstations, T. Chtang, Y. Tang. L Gupta y 

and S. Cooperman, Lwal AeroSys 113“/ 

GE Networked Mass Storage Solutions Supporting IEEE Network Mass Storage Model 

Donald Herzog. GE Aerospace 1 19 " > 


High-Speed Data Duplication/ Data Distribution - An Adjunct to the Mass Storage 

Ekjualion, Kevin Houxird, Exabyte Corporation. 123"/'^ 


The Fundamentals and Futures of Removable Mass Storage Alttfnatlvcs. _ , , 

Linda Kempster. Strategic Management Resources. Ltd. 135 '' 


The NT Digital Micro Tape Recorder, Toshikazu Sasaki, John Alstnd, and M*ke Younker. __ , 
Sony Magnetic Products, Inc. 143 ~ ■ 

RAID 7 Disk Array, Lloyd StouL AC Technology Systems 159 


Tutorial: Performance and Reliability in Redundant Disk Arrays, Garth A. Gibson, 
Carnegie Mellon University r 

Striped Tertiary Storage Arrays, /i.ir L Drapeau, University of California at Berkeley 

National Media Laboratory Media Testing Results. William Mularie and Gary Ashton. 
National Media Laboratory 


163 

203 


-''a/ 


215 -“/^ 


Evaluatl 'n of D- 1 Tape and Cassette Characteristics: Moisture Content of Sony and 

Ampex D- 1 Tapes When Delivered, Gary Ashton, National Media Laboratory 217— / 7 


vll 

— utita anuai »»» 



TABLE OP CONTENTS (Continued) 


Volume I (Continued) 


Grand Challenges in Mass Storage - A Systems Integrators Perspective. Richard R Lee. 
Data Storage Technologies. Inc.. Dan Mtntz. W. J. Cidver Consulting 



The Modem High Rate Digital Cassette Recorder. Martin demow, 9 

Penny & CQes Data Systems. Inc. 245 ' 

Towards a 1000 Tracks Digital Tape Recorder. J. M. CouteWer. J. P. Castera, J. CoHneau. j 

J. C. Lehureau. F. Maurice, and C. Hanna. Laboratoire Central de Recherches 251 - * 


Evolution of a High-Performance Storage System Based on Magnetic Tape ^ i 

Instrumentation Recorders. Bruce Peters. Datatape. Inc. 253 

Mass Optical Storage - Tape (MOST). WUliam S. Oakley, Lasertape. Inc. 257-' 


ICi Optical Data Storage Tape - An Archival Mass Storage Media. Andrew J. Ruddick, , ^ 

/Cf Imc^edata. 265*1 


Flexible Storage Medium For Write-Once Optical Tape. Andrew J. G. SUmdJord, 


Steven P. Webb, Donald J. PereUie. and Robert A. Cipriano. ^ ^ 

The DOW Chemical Company 275^-1 

Electron Trapping Data Storage Systems and Applications (Abstract). . / 

Daniel Brower. Alien Earman and M. H. Chqffbi, Optex Corporation 285' ' 


The "State" of "The State of The Art" in Mass Storage Technol<®r. Dale Lancaster. 

Convex Computer Corporation 287 

Measurements over Distributed High Performance Computing and Storage Systems 
(Abstract). Eltzabeth WtHUuns, Supercomputing Research Center, and Tom Myers. . / 

Department oj Defense 2^ ' 

Analysis of Cache for Streaming Tape Drive. V. Chirmaswamy. Digital Equipment r 

Corporation 299/-U 

LANL High-Perfomiance Data System (HPDS). M. William Collins. Danny Cook. 

Lynn Jones. Lynn KluegeL and Cheryl Ramsey. Los AJ.amos National Laboratory 311' 

9 

Optimizing Digital 8mm Drive Performance. Gerry Schadegg, Exabyte Corporation 317-"^ 


Using Transparent Informed Prefetching (TIP) to Reduce File Read Latencj'. 

R. H. Patterson. G. A. Gibson, and M. Satyanarayanan. Carnegie Mellon Uniutrsity 329 


viil 



TABLE OP CONTENTS 


Volume n 


Keynote Address, Davla iVetson, Department of Energy 343 

Current State of the Mass Storage Reference Model, Robert Coyne. IBM Federal Systems 
Company. 357 

The Standards Process: X3 Infonxiation Processing Systems, Jean-Paid Emard. 

Computer and Business Equipment Manufacturers Association. 377 

The Standards Process: Technical Committee X3B5 Digital Magnetic Tape, 

Sam Cheatham Storage Technology Corporatlcn 395 

Data Management In NOAA (Vlewgraphs), William M. Callicott. NOAA/NESDIS 411 

Analysis of the Data and Media Management Requirements at the NASA National Space 
Science Data Center fText Not Made Available), Ron BlUstein. Hughes STX Corporation 421 

Accessing E^rth Science Data from the EOS Data and Information System, 

Kenneth R. McDonald and Sherri Calvo, NASA Goddard Space Flight Center 423 

Recording and Wear Characteristics of 4 and 8 mm Helical Scan Tapes, Klaus J. Peter, 

Media Logic. Inc. and Dermis Speliotis. Advanced Development Corporation. 431 

Stnped Tape Arrays (Vlewgraphs), Ann L. Drapeau. University of Callfomia at Berkely. 449 

Ultra-High Density Recording Technologl.'s, Mark H. Kryder. Carnegie Mellon University 457 

National Media Laboratory Media Testing Results (Vlewgraphs), Bill Mularie and 

Cary Ashton. National Media Laboratory 477 

Grand Challenges In Mass Storage. "A System integrator’s Perspective" (Vlewgraphs). 

Don Mintz. W. J. Culver Consulting, Richard Lee. Data Storage Technologies. 

Incorporated 489 

Kodak Phase-Change Media for Optical Tape Applications, Yuan-sheng Tyan, 

Donald R. Preuss. George R. Olin. FddJich Vazan, Kee-chuan Pan. and 

Pranab. K. Raychaudhuit Eastman Kodak Company 499 

Electron Trapping Optical Data Storage System and Applications, Daniel Brower, 

Aden EMnnan and M. H. Chaffin. Optex Corporation 513 


ix 



TABLE OF CONTENTS (Continued) 


Volume n (Continued) 

Panel Discussion on Magnetic/ Opt leal Recording Technologies, 

Moderator; P. C. Haiiharan, Hughes STX 521 

Data Storage: Retrospective and Prospective. Dersifs Speliotis. Advanced Development 
Corporation 535 

Measurements over Distributed High Performance Computing and Storage Systems 
(Paper and Vlewgraphs), Elizabeth WtUtams, Supercomputing Research Center, and 
Tom Myers, Department of Defense 539 

Performance of a Distributed Superscalar Storage Server. Arian Ftnestead, University of 
nitnots, and Nancy Yeager, National Center for SupercompuUng AppllcatUms 573 

"nie Redwood Project: An Overview. Sam Cheatham, Storage Technology Corporation 581 

Architectural Assessment of Mass Storage Systems at GSFC, M. Hcdem, J. Behnke, 

P. Pease, and N. Palm, NASA Goddard Space Fligi d Center. 599 

Panel Discussion on High Performance Helical Scan Recording Systems 

Moderator. James F. Berry. Departnvent of Defense. 611 


X 



N93*® 0 450 

/i.'’ • - ' 

Mass Storage System Reference Model: 

Version 4 


Developed by the IEEE Technical Committee on Mass Storage 

Systems and Technology 


Edited by: 


Sam Coleman 

Lawrence Livermore National Laboratorv' 

Steve Miller 
SRI International 


Sam ColexT^cii 

Lawrence Livermore T^uonal Lab 
Mall Stop 1/-S0 
P. O. Eojl 808 
Livermore, CA 94550 


All rights leserved by the Technical Committee on Mass Storage Systems and Technology, InstKute of 

Electrical and Electronics Engineers, Inc. 

This is ar ^approved draft subject to change and cannot be presumed to reflect the positic i of the 
Institute of Electrical and Electronics Engineers, (nc. 


1 


V/ 




L Prcfece 


The purpose of Ihts reference modei is to 
identify the high level abstractions that 
underlie modem storage systems. The In- 
formation to generate the mode! was col- 
lected from major praclitloners who have 
built and operated large storage facilities, 
and represents a distillation of Ute wisdom 
they have act^idred over the years. The 
model provides a cenanon tenninology and 
set of concepts lo alow existing systems to 
be examined and new systems to be 
discussed and built. It is intended that the 
model and the interfaces Identined from tt 
will allow and encourage vendors to 
develop mutually compatible storage 
components that can be combined to form 
Integrated storage systems and services. 

The refereiKe model presents an abstract 
view of the concepts and organization of 
storage ^-sterns. From this abstraction will 
come the identincallon of the interfaces 
and modules ihat will be used in I£EE 
s'orage system standards. The model is not 
yet suitable as a standard: it does not 
contair. Implementation decisions, suclt as 
how abstract objects should be broken up 
Into software ru>.dules or how software 
modules should be mapped to hosts; It doss 
not give policy specifications, such as when 
files should be migrated; does not describe 
how the abstract objects should be used or 
connected; and docs not refer to specific 
hardware components. In particular, it 
does not fully specify the interfaces. 

A storage system Is the portion of a com- 
puting faculty responsible for the long- 
term storage of large amounts o! Infonna- 
tloii. It is usually viewed as a shared faciUly 
and has traditionally been organized 
around specialized hardw.irc devices. It 
usually contains a vai lety of storage media 
that offer a range of tradcnlfs among eost. 
performance. rcllabllKy. density, ano 
powc. requirements. The storage system 
includes the hardware devices fur storlr>g 
information, the communication media fur 
♦ramsfertlng Information, and the .seftware 
modules for controlling the hart.warc and 
managing the storage. 


The size and complexity of this software is 
often :.verlooked. and Its Importance Is 
growing as computing systems become 
larger and more complex. Large storage 
faculties tend to grow over a period of years 
and. as a result, musi accommcxlatc a col- 
lection of heterogeneous equipment from a 
variety of vendors. Modem computing fa- 
culties are putting increasing demands on 
their storage faclUtles. Often, large num- 
bers of workstations as weU as specialized 
computing machines such as mainframes, 
mlnl-supeicompulers. and supcrcoir.puters 
are attached to the storage system by a 
communication network. Tliese computing 
facilities are able to generate both large 
numbers of fUes and large files, and the 
iiequlrements for transferring information 
to ana from the storage system often 
overwiielms the networks. 

The type of envtronment described above Is 
the one that places the greatest strain on a 
storage ^stem design, and the one that 
most needs a storage system. Tns 
abstractions in the referetKe modei were 
selected to accommodate this type of 
environment. While they are also suitable 
for simpler environments, their 
desirability is perhaps best appreciated 
when viewed from the percycc'.tve of the 
most complicated envlronmtm. 

There is a spectrum of system architectures, 
from storage services being ..ipplled as 
single nodes specializing in long-term 
storage to what Is referred to as "fully 
distributed systems". The steps In this 
spectrum arc most easily distingulshtd by 
the transparcra:les that they provide, where 
they are provided in the site configuration, 
and whether they are provided by a site 
administrator or by system mar.agemcnt 
seftware. The trend toward distributed sys- 
tems Is appealing because it allows all 
storage to be viewed In the same way, as 
part of a single large, transparent storage 
space that can be globally optimized. This 
Is especially important as systems grow 
more complex and better use of storage is 
required to achle/e satisfactory 
performance levels- Distributed systems 


3 




also tend to break the dependence on single, 
powerful storage processors and may 
increase availability by reduciug reliance 
on single nodes. 


1.1 Transparencies 

Many aspects of a distributed system are 
irrelevant to a user of the system. As a 
result, it is often desirable to hide these 
details from the user and provide a higher- 
level abstraction of the sy^em. Hiding de- 
tails of system operation or behavior from 
users is known as providing transparency 
for tliose details. Providing transparency 
has the effect of reducing the complexity of 
Interacting with the system and thereby 
improving the dependability, maintain- 
ability. and usability of applications. 
Transparency also makes it possible to 
change the im'^ertying system because the 
hidden details will not be embedded in ap- 
plication programs or operating practices. 

The disadvantage of using transparency' is 
that some efficiency can be lost in resource 
usage or performance. This occurs because 
the mechanism that provides the trans- 
parency masks semantic Information and 
causes the system to be used conservatively. 
High-performance data base systems, for 
erampte, may need to organize ^k storage 
directly and schedule disk operations to 
gain performance, rather than depend on 
lower-level file systems with their own 
structure, scheduling, and policies for 
caching and migration. 

There is a range of support that can be 
provided for distributed systems in a 
computer network. A system with few 
transparencies is often called a networked 
system. The simplest kind of networked 
system provides utilities to allow a 
progmi to be started on a specified host 
and information to be transferred between 
specified storage devices. Etxamplcs Include 
TCLNET and FTP. respectively. This type of 
system rarely provides support for 
heterogeneity. At the other end of the 
spectrum are fully distributed systci..s that 
provide many transparencies. An example 
is LOCUS. In ‘"ilstiibuted systems, a goal is 
for workstations to appear to have 
unlimited storage and processing 
capacities. 


System and application designers must 
think carefully about what transpareiK'ies 
will be provided and whether they will be 
mandatory. It is possible for appllcatious 
to provide certain transparencies and net 
others. Fimdamental transparencies can be 
implemented by the system, saving each 
user from re-lmplementlng them. A 
common implementation will also 
Improve the likelihood that the 
transparency will be implemented 
efficiently 

The common transparencies arc: 

Access 

Clients ao not know if objects or services 
are local or remote. 

Concurrency 

Clients arc not aware that other clients 
are using services concurrentlv. 

Data representattoo 

Clients are not aware that different data 
representations are used In different 
parts of the system. 

Exeentten 

Programs can execute in any location 
without being changed. 

Fault 

Clients are not aware that certain faults 
have occurred. 

Identity 

Services do not make use of the Identity 
of their clients. 

Location 

Clients do not know where objects or 
services are located. 

Migration 

Clients arc not aware that services have 
moved. 

Naming 

Objects have globally unique na.mes 
which are Independent of resource and 
accessor location. 

Performance 

Clients see the same performance re- 
gardless of the location of objects and 
services (this is not always achievable 


4 



unless the user is willing to slow down 
local performance). 

Replica tion 

Clients do not know if objects or services 
are replicated, and services do not know 
if clients are replicated. 

Semantic 

The behavior of operations is 
independent of the location oi operands 
and the type of inures that occur. 

Syntactic 

Clients use the same operations and pa- 
rameters to access loc^ and remote ob- 
jects and services. 

Some of the transparencies overlap or in- 
clude others. 

With this in mind, it is inciunbent upon the 
Storage System Standardsi Working Group 
to fdentUy interfaces and modules that are 
invariant from single storage nodes to fully 
distributed systems. Many sites are not 
likely to embrace fully distributed systems 
in a single step. Rather, they are Ukely to 
evolve gradually as growing system slxc and 
complexity dictate and as A'endors make 
available products supporting fully dis- 
tributed sy^ems. 

1.2 Requirements 

Modem computing facilities are large and 
complex. They contain a diverse collection 
of f ardware coruiected by communication 
ne*«...'orks. and are used by a wide variety of 
users with a spectrum of often-conflicting 
requirements. The hardware includes a 
range of processors from personal com- 
puters and workstations to mainframes 
and supercomputers, and many types of 
storage devices such as magnetic disks, 
optical disks, and magnetic tapes. This 
equipment is typically supplied by a variety 
of vendors and. as a result, is usually 
heterogeneous. Both the hardware 
characteristics and the user requirements 
make this type of facility extremely 
complicated. 

To insure that the reference model applies 
to many computer environments, the IEEE 
Technical Committee on Mass Storage 
Systems and Technology identified the fol- 
lowing requirements: 


• The model should support both cen- 
tralized and distributed hierarchical, 
multi-media file systems. 

• The model sliould support the simplest 
randomly addressable file abstraction 
out of which higher level file structures 
can be created f*.g.. a segment of bit? or 
bytes and a heade.^ of attributes). 

• Where the defined services are ap- 
propriate. the model should use na- 
tloiial or international standard oro- 
tocols and interfaces, or subsets 
thereof. 

• The model should be modular such that 
it meets the following needs: 

- The modules should make sense to 
produce commerclalfy^. 

- It should be reasonable to integrate 
modules from two or more vendors. 

- The modules should Integrate with 
each other and existing operating 
systems (centralized and dis- 
tributed). singly or together. 

- It should be possible to build hier- 
archical centralized or distributed 
systems fiom the standard modules. 
The hierarchy might Include, for 
example, solid state disks, rotating 
disks (local and remote), an on-line 
library of archival tape '' rtrldges 
or optical disks, and an off-line, 
manually-operated archival vault. 

- Module interfaces should remain 
the same even though 
implementations may be replaced 
and upgraded over time. 

- Modules should have standardized 
interfaces hiding implementation 
details. Access to module obicets 
should only be through these inter- 
faces. Interfaces should be specified 
by the abstract object data struc- 
tures visible at those interfaces. 

- Module Interfaces should be media 
independent. 


5 



• File operations and parameters should 

meet the following requirements: 

- Access to local and remote resources 
should use the same operations and 
parameters. 

- Behavior of an operation should be 
independent of operand location. 

- Performance should be as indepen- 
dent of location as possible. 

- It should be possible to read and 
write both whole files and arbi- 
trary-slzed. randomly-accessible 
pieces of files. 

- The model should separate policy 
and mechanism such that it 
supports standard as well as vendor- 
or site-specific policy submodules 
and interlaces for access control, ac- 
counting. allocation, site manage- 
ment, security, and migration. 

- The model should provide lor de- 
bugging. diagnostics, and mainte- 
nance. 

- The model should support a re- 
quest/reply (transaction) oriented 
communication odel. 

- Request and data communication 
associations should be separated to 
support high speed direct source to 
destination data channels. 

- Transformation services (e.g. 
translation, c^eck summing, en- 
cr>'ption) should be supported. 

• The model should ;.neet the following 

naming requirements: 

- Objects should have globally 
unique, maculne-orlented names 
which are Independent of resource 
and access location. 

- Each operating system or site en- 
vironment may have a g ffcrent 
human-oriented naming system, 
therefore human- and mach.ne- 
orlented naming should be clearly 
separated. 


- Globally unique, dlstrlbutlvely 
generated, opaque (lie tdentmers 
should be used at the cllent-to- 
storage-system interlace. 

• The model should support the 
following protection mechanism 
requirements: 

- System security mechanisms 
should assume mutual suspicion 
between nodes and networks. 

- Mechanism should exist to 
establish access rights independent 
of location. 

- Access list, capability or other site. 
vendOT. or operating system spectfk: 
access control should be 
supportable. 

- Security or privacy labels should 
exist for all objects. 

• The model should support appropriate 
lock types for concurrent file access. 

• lx)ck mecharusms for automatic mi- 
gration and caching (l.e.. multiple 
copies of the same data or (Iks) should 
br provided. 

« T!ie model should provide mechanisms 
to aid recovery from network, client, 
server crashes and protection against 
network or Interface errors. In par- 
ticular. except for file locks, the file 
seiver should be stateless (e.g.. no state 
maintained between "open" and 
’close’ calls). 

• The model shoulo support the concept 
of fixed and removaole lexical volumes 
as separate abstractions from physical 
volumes. 

• It should be possible to store one or 
many logical volumes on a physical 
volume, and one logical volume should 
be able to span multiple physical vol- 
umes. 


6 



2 Introduction^. 


2.1 Background 

Prom the eaiiy days of ^.xnnputers. 'storage* 
has been used to refer to the levels ol storage 
outside the cential iff cessor. If “memory* 
Is dliferentlated to ' . inside the central 
processor and “Mton f ;* to be outside. (i.e.. 
requiring an .npu. -output chaimel to 
access), the first ’.et cl of storage is called 
*prin:.ary storagr* i Grossman 89). The 
predominant tcchrology for this level of 
storage has bxn magnetic disk, or solid- 
state roemoiy conf'gcred to emulate mag- 
netic disks. &nd vrl' remain so for the 
foreseeable future ir. irtually every size of 
computer ^siem fiat personal computers 
to supcrcoa^puters. .Magnetic disks con- 
nected dlrecU> to I/O channels are often 
called “local* oisks while magnetic disks 
accessed through a network are referred to 
as “remote* or ' centr, 1* disks. Sometimes 
a solid-state cache is interposed between 
the main memory and primary storage. 
Because networks have altered the access to 
primary storage w ' will use the terms “local 
storage* and ‘remote storage* to 
differentiate die dilerent roles of disks. 

The next level of dai'.a storage is often a 
magnetic tape libi ary. Magnetic tape has 
also played several roL»' 

• On-line archive kicwn as 'loiig term 
storage* (e.g.. less active storage than 
magnetic disk), 

« off-line archival st >rage (possibly off- 
site) . 

« backup for critical .He s. and 

% as an I/O medium (Ira. isfer to an._ from 
other systems). 

Magnetic tap>e has be^n u.*'"d in these roles 
because it has enjoyed ^^e lowest cost-per- 
bit of any of the widely used technologies. 
As an I/O medium, ina^nellc tape must 
conform to stand?* .sucl tliat the tape can 
be written on ' .c syst.;in and reau on 
another. This ;s not necessarily tiue for 
archival or backup stonige nies. where 


nonstandard tape sizes and formats can be 
used, even though there are potential 
disadvantages if standards are not used 
even for these purposes. 

In the early 1970s neaily every major 
computer vendor, a numb^ of new com- 
panies. and vendors not otherwise in the 
computer business, developed some type of 
large peripheral storage device. Burroughs 
and Biyant experimented with spindles of 
4-ft diameter magnetic disks. Control Data 
experimented with 12 In. wide magnetic 
tape wrapped nearly all the way around a 
drum with a head per track. The tape was 
moved to an Indexed location and stopped 
while the drum rotated for operation. 
(Davis 82 presents an interesting com- 
parison of devices that actually got to the 
maiketplace.) 

Examples of early storage tystems are the 
Ampex Terabit Memory (TBM) (Wildmann 
75). IBM 1360 Photostore (Kuehler 66). 
Braegan Automated Tape Ubraiy. IBM 3850 
Mass Storage System (Harris 75. Johnson 
75). Fujitsu M861. and the Control Data 
38500. One of the earliest systems to em- 
ploy these devices was the Department of 
Defense Tablon system (Gentile 71). which 
made use of both the Ampex TBM and the 
IBM Photostore. Much was learned about 
the software requirements from this 
installation. 

The IBM 1360. first delivered In the late 
1960s. used wrlle-once. read-many (WORM) 
chips of photographic film. Each chip 
measured 1.4 x 2.8 in. and stored 5 megabits 
of data. “File modules* were formed of 
either 2250 or 4500 cells of 32 chips each. 
The entire process of writing a chip, 
photographically developing it, inserting 
the chip in a cell, and a cell in a Ale module, 
storing and retrieving for read, etc., was 
managed by a control processor similar to 
an IBM 1800. The complex chemical and 
mechanical processing required consider 
able maintenance expertise and. while the 
Photostore almost never lost data, the 
maintenance cost was largely responsible 


7 


for its retirement. A terabit system could 
retrieve a file in under 10 seconds. 

The TBM. first delivered In 1971. was a 
magnetic tape drive that used 2-mch-wlde 
magnetic tape in large 25.000-foot reels. 
Each reel of tape had a capacity of 44 giga- 
bits and a file could be retrieved, on the 
average, in under 17 seconds. With two 
drives per module, a ten module (plus two 
control modules) system provided a terabit 
of storage. The drive was .a digital re- 
engineering of broadcast video technology. 
The drive connected to a channel through a 
controller, and cataloging was the respon- 
sibility of the host system. 

The Braegan Automated Tape Library was 
first delivered in the mid 1970s and con- 
sisted of special shelf storage housing 
several thousand half- inch magnetic tape 
reels, a robotic mechanism for moving 
reels between shelf storage and self- 
threading tape drives, and a control 
processor. This conceptually simple system 
was originally developed by Xytecs. sold to 
Calcomp. and then to Braegan. In late 1986. 
the production rights were acquired by 
Digital Storage Systems. Longmont. 
Colorado. Sizes vary, but up to 8.000 tape 
reels (9.6 tetabits) and 12 tape drives per 
cabinet are fairly common. 

The IBM 3850 (Johnson 75. Harris 75) used 
a cartridge with a 2.7-lnch-wlde. 770-ln. 
long magnetic tape. A robotic cartridge 
handler moved cartridges between their 
physical storage location (sometimes called 
the honeycomb wall) and read/write 
devices. Data accessed by the host was 
staged to magnetic disk for host access. De- 
staging the changed pages (about 2 
megabits) occurred when those pages 
became the least recentij used pagec on the 
staging disks. Stagi-og aisks consisted of a 
few real disk de»lcc. , which served as 
bufiers to the entire tape cartridge library. 
The real disks were divided into pages and 
used to make up many virtual disk devices 
that could appear to be on-line at any given 
time. 

Manufactured by Fujitsu and marketed in 
this country by MASSTOR and Control 
Data the M861 storage module uses the 
same data cartridge as the IBM 3850; 
however. It Is formatted to hold 175 
megabytes per cartridge. The M86i holds up 


to 316 cartridges and provides unit capacity 
of 0.44 terabits. The physical cartridges are 
stored on the periphery of a cylinder, where 
a robotic mechanism picks them for the 
read-wrlte station. The unit achieves about 
12-second access time and 500 
mounts/dismounts per hour. 

A spectrum of interconnection 
mechanisms was described (Howie 75) that 
included: 

• The host being entirely responsible for 
special hardware charac* eristics of the 
storage system device. 

• the device characteristics being 
translated (by emulation In the storage 
system) to a device known by the host 
operating system, and 

• the storage system and host software 
being combined to create a general sys- 
tem. 

This has sometimes been termed moving 
from tightly coupled to loosely coupled sys- 
tems. Loosely coupled ^sterns use message 
passing between autonomous elements. 

The evolution of the architectural view of 
what constitutes a large storage system ha» 
been shaped by the growth in sheer size of 
systems, more rapid growth of interactive 
raiher than batch processing, the growth of 
networks, distributed computing, and the 
growth of personal computers, worksta- 
tions. and flic servers. 

Many commercial systems have tracked 
growth rates of 60-100% per year over 
many years. As systems grow, a number of 
things change just because of size. It be- 
comes difficult for large numbers of people 
to handle tape reels, so automating the 
fetching and returning and the mounting 
and dismounting of reels becomes 
Important. As size Increases, it also 
becomes more difficult for humans to 
decide which devices to use for load 
balancing. 

Because of this growth, early users of 
storage systems were forced to do much of 
the systems Integration in their own site 
environments. Large portions (software 
and hardware) of many existing systems 
(GenfUe 71. Penny 73. Fletcher 75. Collins 


8 



82, Coleman 84) were developed by user 
organizations that were faced with the 
problem of storing, retrieving, and 
managing trillions of bits and cataloging 
milllcns of files. The sheer size of such 
storage problems meant that only organi- 
zations such as govenunent labcratorles. 
which possessed sufficient systems engi- 
neering resources and talent to complete 
the integration, initially took on the 
development task. These individualized 
developments and integrations resulted in 
storage systems that were heavily 
Intertwined with other elements of each 
unique site. 

These systems initiated an evolution in 
storage products In which three stages are 
readily recognizable today. During the first 
stage- a storage ^tem was viewed as a very 
large peripheral device serving a single 
^stem attached to an I/O channel on a 
central processor in the same maiuier as 
other perlpheial devices. Tasks to catalog 
the files and free space of the device, 
manage the flow of data to and from It. take 
care of backup and reco^'ery. and the many 
other file management tasks. «vere added as 
application programs within the systems 
environment. Many decisions, such as 
when to migrate a fUe. were left to the user 
or to a manual operator. If data was moved 
from the storage system to local disk, two 
host chatmels (one for each device) were e- 
qulred plus a significant amount of main 
memory space and central processing ca- 
pabll'ty (Davis 82). 

During this stage, the primary effort in 
design was machine-room automation to 
reduce the need to manually mount and 
dismount magnetic tapes. 

The second stage (late 1970s to present) has 
been characterized by centralized shared 
service that takes advantage of the 
economies of scale and provides file server 
nodes to serve several, perhaps heteroge- 
neous. systems (Svobodova 84). 

This stage of the storage system evolution is 
the one that is most prevalent today. The 
storage system node entails using a control 
processor to perform the functions of the 
reference model in a storage hierarchy 
(O’Lear 82). The cost of the storage system 
makes it desirable to share these 
centralized facilities among several 


computational systems rather than provide 
a storage system for eac'.i computational 
system. This Is espec'ally true when 
supercomputers become a part of the site 
configuration. 

This approach to providing storage has 
several advantages: 

• The number of processors that have 
access to a file Is larger than that 
which can share a peripheral device. 
(This type of access is not the same as 
sharing a file, which Implies 
concurrent access.) 

• Multiple read-only copies of data can 
be provided, circumventing the need 
for a large number of processors 
having access to common storage 
devlce.s. 

• Processors of different architectures 
can have access to common storage, 
and therefore to common data, if they 
are attached to a network and use a 
common protocol for bit -stream 
transfer. 

• The Independence between the levels of 
storage allows the Inclusion of new 
storage devices as they become com- 
mercially available. 

Some of the earliest systems in this shared 
service stage were at the Lawrence Berkeley 
National Laboratory (Penny 70) and 
Lawrence Livermore National I.aboratory 
(LLNU (Hetcher 75. Watson 80). 

The Los Alamos Common File System 
(Collins 82. McLany 84) and the system at 
the National Center for Atmospheric 
Research (Nelson 87. O'Lcar 82) are more 
recent examples of shared, centralized 
storage system nodes. 

The third stage Is the emerging dlstrlbuied 
system. An essential feature of a distributed 
system (Enslow 78. Watson 81a, 84. 88) Is 
ihat the network nodes are autonomous, 
employing cooperative communication 
with other nodes on the network. The 
control processors of storage systems 
developed during this stage provide this 
capability. 


9 



The view of a storage system as a 
distributed storage hierarchy Is neither a 
device nor a single service node, but Is the 
integration of distributed computing 
systems and storage system archttectu’.es 
with the elements that provide the storage 
service distributed througnout the system. 
The distributed computing commuiilty has 
been very Interested In the problems of 
providing file management setvlci*s. albeit 
generally on smaller systems (Aimes 85. 
Blrrell 82. Brownbrldge 82. Donnelley 80. 
Leach 82. Svobodova 84. Watson 81a). 
Probably the best known example at the 
workstation level is the SUN Microsystems 
“network file server" (Sandbeig 85). 

Several elements are necessary for a system 
to be classed as "distributed" (Enslow 78): 

• A multiplicity of general-purpose 
resource elements. 

• the distribution of these elements, 
logically and physically. 

• a distributed (network) operating sys- 
tem. 

• system transparency (service requests 
by name only), and 

• cooperative communication among 
elements (nodes). 

Achieving all of these elements sounds dif- 
ficult and expensh'e. The motivations most 
often cited are extensibility, availability, 
and costly resource sharing (LeLann 81). 
Readily extensible systems permit the "hot 
wiring" necessary In large systems that can 
no longer afford downtime for cabling in 
new elements. Extensibility also means 
that Individual elements can be upgraded 
without 'disrupting the entire system. 
Syste availability Is obtained by 
repllc ..ng ^ -stem elements In a way that 
permits gii "ful degradation. Sharing 
costly ele.aents occurs through 
communications and networking. 

The Issues Involved in designing 
distrlbut“d systems with the 
charactcrlS'.Ics outlined above were 
discussed by Dr. Richard W. Watson of the 
Lawrence Livermore Nat‘,.mal Laboratory 
at the Eighth IEEE Mass Storage 
Symposium in Tucson. Arizona. May 1987. 


He stated (Watson 87) that the long-range 
goal Is to design systems In which 
"mainframes, minicomputers, worksta- 
tions. networks, multiple levels of storage, 
and Input/output systems are viewed as 
elements of a logically single distributed 
computer whose resources are managed by 
and accessed through a single distributed 
operating system." 

Individual operating systems have their 
own way of handling flies. One reason for 
requiring a distributed operating ^stem is 
to provide a single logical file and naming 
system. This distributed file system should 
be accessed by name only: that is. the 
naming and heterogeneo js features of dif- 
ferent component parts should be trans- 
parent to the user. Logically, the distributed 
storage system should have infinite 
capacity and unlimited file size. Tills is 
obtained through the use of migration 
among the distributed storage elements 
that make up the storage hierarchy. The 
different levels of storage probably have 
different storage characteristics and costs. 

Other design goals Include high reliability 
and availability, high performance (low 
delay and high throughput), mandatory and 
discretionary access control, file sharing 
and safe concurrent access (Lantz 85). and 
accounting and administrative controls 
(Mullender 84). 

It now appears that an attainable goal Is to 
design interconnected systems, whose 
subitems can be produced by a number of 
vendors, such that the file service Is uni- 
form from the user’s local level through all 
levels of the on-line hierarchy to shelf 
storage. Internally, the distributed hier- 
archical storage system will consist of 
multiple levels of storage such as bulk 
semiconductor memories, magnetic disks, 
magnetic tape, optical disks, automated 
media libraries, and manual vaults. Such a 
system is currently under development at 
Lawrence Livermore National Laboratory 
(Coleman 84. Fogiesong 90. Gary 90. Hogan 
90) 

2.2 Motivation 

The central architectural features of the 
reference model and the motivation for 
them can be summarized as follows: 


10 



• An object-oriented description allows 
the Identification of a modular set of 
standard services and standardized 
client/server interfaces. The reference 
model servers are poientlall> viable 
commercial products and are building 
blocks for higher-level services and 
recursive integration In centralized, 
shared, or distributed hierarchical 
storage systems. This Integration can 
be done within single-vendor systems, 
by third-party, value-added system In- 
tegrators. or by end-user organiza- 
tions. The object-oriented modularity 
hides implementation details, 
allowing many possible 
implementations In support of the 
standard abstract objects and 
Interfaces (Booch 86). 

« For the storage system to be integrated 
with applications and operating 
systems supporting many different 
Internal file structures, the abstract 
object visible to stcrage-s)istem clients 
is an uninterpreted string of bits and a 
set of attributes. 

• For the stoiege system to be Integrated 
with applications and operating 
systems supporting many dtfferem 
internal file structures, the abstract 
object visible to storage-system clients 
is an uninterpreted string of bits and a 
set of attributes. 

• The separation of human-oriented 
naming from machine -oriented flic 
Identifiers allows integration with 
current and future operating systems 
and site-dependent naming systems. 
This Implies separation of the name 
server as a si.|;arate module associated 
with the reference model (Watson 81b). 

• The separation of access rights control 
as a site-specific module with a stan- 
dard Interface to the storage system 
accommodates the many operating 
systems and site-dependent access 
control mechanlsm>s in existence. 

• The separation of the request and data 
communication paths supports 
existing practices and the need for 
third-party control of transfers 
between two entities by direct data 


transfers from source to sink, as well 
as data transfer redirection and 
pipelining through such data 
transformations as encryption, 
compression, and check-summing. 

• The separation of the site manager al- 
lows site-dependent policies and status 
to be managed. Provision is made for 
standard site-management interface 
functionality. 

• Inclusion of a migration server within 
the Ale server allows each Ale server to 
be self-contained and file-migration 
policies for each server to be estab- 
lished separately. It also facilitates 
building a hlerarchlcai storage system 
supporting automatic migration be- 
tween servers. Tlie general goal is to 
cache the most active data on the 
fastest storage servers and the least 
active on storage servers with the 
lowest cost-per-bit medium. 

It is envisioned that the modules of the 
reference model can be integrated in 
various combinations to support a variety 
of storage r.c'ds from single storage 
systems to distributed, hi archlcal 
systems supporting automatic file 
migration. Vendors can build and market 
individual standard modules or integrated 
systems supporting standard Interfaces and 
functionality. HopefuUy. the development 
of standards will increase markets arid lead 
to modules and systems manufactured in 
larger numbers, thus reducing costs as a 
result cf mass production econcnnles. 

To better understand the modularization 
and the requirements placed on interfaces 
but not to force a particular design 
philosophy, the discussion in this 
document does not restrict itself to external 
interfaces and services as might be expected 
of a reference model. The intent is not to 
standardize the internal structure of 
modules, since this is Implementation- and 
vendor-specific, but to provide addiUonal 
understanding to aid the model building, 
interface standardization, and implenien- 
lation processes. 


11 



2.3 Reference Model Architecture 

XS.1 Atwtnct Directs 

To follow the description of the reference 
model, there are several concepts that 
should first be established. These concepts 
employ the properties of abstiact objects 
{Watson 8 la), which have been succinctly 
listed as : 

• Objects are an Instance of a type (file, 
process, directory, account, etc.). As 
such, an object type is defined by: 

- An Identifier. 

- A logical representation visible at 
an mterface (e.g.. a logical represen- 
tation of a file is a set of attributes 
and a data segment of uninterpreted 
bits). 

- A set of operatlorrs or functions and 
associated parameters presented at 
the interface to create, destroy, or 
manipulate the object. 

- Specification of sequences of opera- 
tions that are allowed. 

• Objects are managed by servers. There 
can be many servers for a given type 
(e.g.. there can be many file managers). 

• Objects a) e of two basic classes, active 
and passive. To be manipulated, 
passive objects (such as files, 
directories, or accounts) must be acted 
on by requests from active objects 
presented at the server interfaces. 
Active objects, which are mainly 
processes, can directly change aspects 
of their own representation. Active 
objects can play either or both of two 
roles, a client role accessing a service, 
and a server role providing a ser :e or 
managing a type of object. 

• Objects are named via an 
Identification scheme with a machine- 
oriented name that is unique 
throughout an environment. This 
identification scheme may be used in 
conjunction with protection and 
resource management schemes. 
Human-orlentcd naming Is 
Implemented by separate name ^''rvrrs 


that associate mnemonic, human-ori- 
ented names with the machine- 
oriented object Identifiers. Higher- 
le^’ d file services might integrate the 
name and file services. 

• Access to objects Is controlled by the 
server through access lists, capabili- 
ties. or other techniques. 

2SJI CUent/8erver Pixqrerties 

The client/server model (Watson 81a) is an 
object-oriented paradigm. Simply stated, 
both clients and servers are active abstract 
objects In which the client requests services 
from the server through a specified 
Interface. The word client is used to mean 
the program that accesses srane service. The 
word user is reserved to mean the human at 
the terminal. A client is an agent of a user. 
The server is a provider of a service. Access 
to server-supported objects or services is 
only through defined server interfaces, thus 
hiding Implementation details to provide 
transparency. Both the client and the 
server may be processes or collections of 
processes. These processes are not 
necessarily associated with any ,'artlcular 
host machine. We describe the client/server 
Interactions in terms of messages, but it is 
understood that local or remote procedure 
calls (Birrell 82) or other communications 
paradigms are possible. 

A server may be thought of as a collection of 
one or more tasks or processes 
(concurrently executing instruction 
streams). A server may include request 
processing and other tasks supporting 
concurrent handling of requests from 
many clients. Clients may also be 
constructed as many cooperating, 
concurrent tasks. 

Client and server processes interact by 
sending each other messages, in the form of 
requests and replies. A message Is the 
smallest unit of dala that can be sent and 
received between a pair of correspondents 
for a meaningful action to take place. A 
client process accesses a resource by 
sending requests containing the operation 
specification and appropriate parameters 
from one of Its ports to a server port. A 
given process can operate In both server or 
client roles at different times (Watson 84). 
For example, during the migration of files. 


12 



a file server that manages magnetic disks 
can play the role of client to a file server 
tliat manages magnetic tapes. Another ex- 
ample is a name server that stores its cat- 
alogs in a flic server. 

A distinction is drawn between the words 
server and service. A service may include 
several servers (Svobodova 84). For ex- 
ample. a directory service might be im- 
plemented by having separate name servers 
for objects such as flies and for other ob- 
jects such as users, addresses or printers. 
On the other hand, one might implement a 
universal directory to provide the whole 
directory service (Lantz 85). or one might 
choose to Implement the (lie service defined 
by the ISO-OSl Virtual FUestore (DIS 8571). 
where this reference mod"' serves the 
unstructured file segment. Thus a complete 
file service will likely consist of name 
servers and multiple file servers. 

2A3 Reference Model Modules 

The primary reference model modules, 
shown in Figure 1. are: 

• The bltjile* server, which handles the 
logical aspects of bltfile storage and 
retrieval. 

• the storage server, which handles the 
physical aspects of bitfile storage, and 

• the physical volume reposltorj. which 
provides manually or robotically re- 
flevable shelf storage of physical 

ledla volumes. 

Closely related to these modules are: 

• The bitfile client, which is the pro- 
grammatic agent of the user required to 
convert user desires into bitfile re- 
quests to the bitfile server and data 
transfer commands to the bitfile 
mover. 


*The word "bitfile" was coined by the lEEE- 
CS Technical Committee on Mass Storage 
Systems and Technology to refer to a bit 
string that is completely unconstrained by 
size and structure; it was coined to relieve 
those who worked on the model from being 
bound by any particular file management 
system. 


• the bitfile mover, which provides the 
components and protocols for high- 
speed data transfer. 

• the name server, which provides the 
retention of bltflle IDs and the con- 
version of human-oriented names to 
bitfile IDs. and 

• the site manager, who monitors oper- 
ations. collects statistics, and estab- 
lishes policy and exerts control over 
policy parameters and site operation. 

These modules are not dh'ectly associated 
with any particular hardware or software 
products. The modularity of the reference 
model defines a virtual store for bitflles. 
The storage system can be implemented 
with many levels of storage hierarchy, 
includin', a physical volume repository. 
The structure of the model permits 
standard interfaces and multiple Instances 
of modules, and thus should facilitate more 
economical implementation of many 
forms of storage architectures. There may 
be many different instances of bitfile server 
and storage serve' combinations in which 
storage servers need not be of the same 
technology and can form a hierarchy. 

The bltflle client represents the program 
object or agent that accesses bitflles. This 
agent is not the aopllcation but acts for the 
application. The bitfile client can take 
many forms depending on how the storage 
system is implemented and Integrated into 
a particular u'^er environment: it might be 
one or more application programs or be 
functionally supported within an operating 
system to facilitate access to storage. The 
bitfile client may run on personal 
computers, workstations, or on larger 
machines. The bltflle client can also be a 
part of a data acquisition system needing 
the services of a storage system. The bltflle 
client can locate bitflles by use of a name 
server. The user's human-oriented bltflle 
names are mapped to bitfile IDs and bltflle 
server addresses by the name server. 

It is the interaction of the bltflle client with 
the bltflle server, the bltflle server in- 
teraction with the storage servers, and the 
storage server interaction with the physical 
volume repository that are of particular 
interest. There may be any number of bit- 


13 



file clients In the general system envi- 
ronment of a site. Furthermore, bltflle 
servers or other entitles of the total storage 
environment, such as name servers, site 
managers, or migration modules, may op- 
erate In the rcle of bltflle clients when they 
need storage service themselves. 

A bltfile server handles the logical aspects 
of the storage and retrieval of bltfiles. The 
bltflle server's abstract object is the bltflle. 
Identified a global^ unique machine- 
oriented bltflle ID. A bltflle is a set of 
attributes (state fields) and an uninter- 
preted. logically contiguous s^ment of data 
bits. 

A bltflle server may keep track of the bit- 
files stored in one or more storage servers. 
A single bltflle server may control a 
hierarchy and need the services of several 
storage servers. As an alternative, a single 
bltflle server may handle the bltfiles In a 
single level of the storage hierarchy or a 
single storage technology: multiple bltflle 
server-storage server pairs simplify ex- 
tensibility and evolution. 

The bltflle server accepts requests frrm 
bltflle clients to create, destroy, store, and 
retrieve bltfiles. and to modify and inter- 
rogate the bltflle attributes needed for sys- 
tem management. The bltflle server 
contains a request processor to parse the 
requests and control the sequence of 
actions necessary to fulfill the requests. 
Before permitting access to a bltflle. the 
bltflle server authenticates the access 
rights of the requestor. 

The bltflle server communicates action 
commands to the storage server. Each 
bltflle server contains a migration 
manager to prevent overflow of the storage 
space for which it is responsible. The 
migration manager knows which bltfile 
server is used to offload bitfiles, as 
established by the site manager and by 
migration and caching policies. 

The storage server handles the physical 
aspects of bltflle storage and retrieval, and 
presents the image of perfect media to the 
bltflle server. (The .apacity of the media. 
Influenced by Imperfections, may be visible 
to the bltfile server.) The storage server's 
abstract objects, as seen by the bltfile 


server, are logical volumes made up rtf an 
ordered Ss.t ot bit string segments. 

Physical volumes and volume serial num- 
bers are not visible to the client. The site 
manager, however, may have privileged 
storage server commands where physical 
volumes and devices are treated as visible 
objects the storage server. Volumes and 
devices are identified by volume and device 
IDs. 

Bit stream segments are Identified by 
segment descriptors. These segments rep- 
resent huw the storage server has allocated 
space for the bltflle daia blocks. Each 
segment Is Identlfled by a descriptor gen- 
erated the storage server, and the ordered 
set Is retained as a bltflle attribute by the 
bltflle server. The storage server must 
Internally map bit string segments to real 
physical volumes (removable or not) and 
addresses where the bltfiles are stored, 
provide read/write (and some error man- 
agement) of those volumes, and be able tu 
access a bltfile mover to transmit and re- 
ceive bltfile data blocks. One or more 
logical volumes may be mapped to a given 
physical volume, or one logical volume 
may be mapped to several physical 
volumes. Thus, a storage server contains 
storage devices, device-specific controllers 
that map bltfiles to bit string segments on 
physical volumes, and a means for 
handling and managing physical volumes 
on ihe storage devices. 

The physical volume repository server 
manages a library that stores physical 
volumes such as tape reels, tape cartridges, 
or optical disk platters. Physical volumes, 
identified by physical volume IDs. are Its 
abstract objects. A physical volume 
repository server can be used by one or 
more storage servers. 

The site manager Is a client process that 
can generate and send ordinary and 
privileged requests to the other servers to 
set policy parameters. Install logical and 
physical volumes (import, export), obtain 
statistics and status, run diagnostics, etc. 
The various clients and servers are Inter- 
connected through a communication 
service, which must handle all of the Inter- 


14 




Figure 1 

The Storage System Reference Model 


process communications involved in separated authentication details to allow 

requests and replies, as well as the hii,ti- each site to install the form appropriate to 

speed transport of bltflles. The elements of that site, 

the communications Si’rvlce may b;; 

distributed through the many physical Existing storage systems often include a 

processors of the site, reference model high-performance data path of some type, 

does not specify the details of this service often specially designed, to handle the 

but does assume its existence whether by high-volume, high-speed data flow between 

procedure call, remote procedure call, or the bitflle client and the storage server. In 

message passing. In particular, the model the reference model, the need for a high- 

does not require homogeneity of this speed data path is incorporated as part of 

communication service. The the communication se-vlce referred to as 

communication service can include i. e bitfile mover. This path has been 

movement across I/O channels as well as separated from the request path to 

across networks. TTie model assumes the corresnond to existing practl^^; to fac«’’ '^te 

ability to separate data movement from third-party control of transfers, to 

request movement. All that is required is facilitate Insertion of daia 
that eiiJtles Uiat must communicate are. Ln transformations such as encryption, 

fact, able to do so and 'hat standard inter- compression, and check-summing; and to 

fa.";s arc .jupported. The model has support transfer redirection. If a general 


15 



network ts used for the conununlcatlon 
sciVlce. It has to account for this need for 
high ti'ansinlsslon speed of bitflles as well 
as the communication of requests and 
replies. 


16 



3l Detailed Description of the Reference Model 


This description of the reference model 
discusses the entitles shown in Figure I in 
more detail. In describing the functions 
accepted by each module, input parameters 
that are common to all functions are 
deleted for clarity. These parameter:* 
Include the ir itihcatlon of the cllem 
making the request or other access-control 
inlonnatlon. accounting information, and 
a transaction identifier. Similarly deleted 
are th^' transaction (dentiner and the 
success/fa<l indication common to all 
responses. If functions fail, error 
Information will repUce the exp 'cted 
responses. Optional parameters that .on be 
deiiuUed are enclosed in square brackets 
(Miller 88a. 88b). 

3.1 mtffie Client 

The bltflle client is the collection of 
hardware and software at a user .node 
within the sHe to permit that node to use 
the storage system. Bitfile clients are 
responsible for providing the storage 
system interface to users at the terminal or 
to application processes by; 

• Translating user and application re- 
quests for storage services into bitfile 
server requests, and 

• providing communication with the 
appropriate bitfile servers and movers 
as determined by the name server 
mapping. 

The bitfile client may be library routines 
within the rppllcatlon. an interface process 
(local or remote), or routines within an 
operating system kernel, and it may com- 
bine the services provided by multiple 
bitfile servers, bltflle movers, and name 
servers to form the higher-level abstract 
objects cf an Integrated stoiige service. The 
syntax and -mantles of messages that flow 
between the bltflle client and the bltflle 
server should be the same regardless of the 
type of bitfile client. These messages iden- 
tify the bitfile to be acted upon and specify 
which, of the basic comniai ds ?.r.i parame- 
ters are to be processed. 


When a bltAle is created, the bitfile ID Is 
generated by the bitfile server and passed 
back to the bltAle client for retention. 
When the user accesses an existing bltAle by 
name, the bltAle client obtains the bltAle 
ID from a name server or some other data 
base system. 

Several arrangements of the bltAle client 
are possible. In the Qrsl arrangement, the 
’kernel view’, all bitAles are logically 
local. The bltAle client Is a program in a 
processor with its own operating system 
and local storage. The storage and site- 
management capabilities of the local 
operating system are what the user sees 
with r' jpect to how his bitAles are handled. 
To the kernel of the operating system is 
added code that permits the operating 
system to determine if a bitfile being 
requested is locally stored or remote 
(Sandberg 85). if It Is remote, this special 
code in the operating system kernel makes 
up the messages for the bitfile server and 
possibly for the name server. Alternatively, 
the local /remote distinction can be made in 
a library routine, and the library code can 
act as the bitAle client (3rownbrldge 82). In 
either case, the bltflle is put into a local 
user buffer or is cached In an operating 
system buAer or a local bitfile and. except 
for a possib’“ delay, the remote transfer is 
transparent to the user. 

In a pure ’client/server* view, all bitfiles 
are logically remote. Here, all references to 
bitfiles are translated Into messages for a 
bltflle server, using routines In a library or 
other run-tLme support facility, such as a 
remote invocation system. The bltflle 
server might be local or remote; In a disk- 
less workstation for example, the bltAle 
server Is remote. Mapping human-onenled 
names to machine-oriented bltflle IDs Is a 
separate. explJcl step or is hidden in the 
run-time support facility (Svobodova 84. 
Watson 84). 

In a third view, the systems that cn te and 
store bitfiles are separate from the systems 
that retrle-’r- and process the data contained 
In the blLf^ uch might be tlic relation- 


17 




ship between a data acquisition system that 
puts bltflles Into the storage system and the 
systems In a data processing center that use 
them. While there is no name server per se. 

means must be provide^ to retain 
bltfUe IDs when they are retumt. a fh>m the 
bltfUe server aiul to pass them in some 
understandable way (e.g. using a base 
management system) to the processing sys- 
tems. Such systems must take care to back 
up their records; if the bltiUe IDs are lost, 
the bltflles become lost objects in the 
storage system. 

SJ2 Name Senrer 

The development of distributed systems has 
caused a much more In-depth look at 
schemes for identifying objects. While the 
advent of distributed systems brought this 
about, ihe requirements recognized are not 
restricted to distributed systems. They 
appfy to all systems, especially those that 
grow In size. Dealing properly with naming 
is central to achieving the location trans- 
parency needed in a distributed system. 
Thus. It is advantageous to look at some of 
the properties of IdentlDcatlon schemes. 

There are mai^ possible ways to designate 
a desired object (Watson 81b): 

• an explicit name or address (object x 
or object at address x). 

• by content (object with value or value 
expression jd. 

» by source (all my files). 

• by broadcast identifier (all objects of a 
class) 

• by group identifiers (participants in 
class jd. 

• by route (all objects found at the end of 
path x). 

• by relatiorL^hip »o a given identifier 
(all previous sequence numbers], etc. 

A useful Informal distlncttcn i}etween three 
Important classes of identifiers widely used 
In system design — names, addresses, and 
routes — is (Sho(^ 78): 


• The name of a resource Is what we seek. 

• an address indicates where it is. and 

• a route tells how to get there. 

One should not examine such informal def- 
initions too closely. Names, addresses, and 
routes can occur at all levels dT ihe archi- 
tecture. Names used in the lnter-p:ocess 
communication layer have often been 
called such terms as ports, or logical cr 
generic addresses. A human-oriented cham 
(x- patn name can be thought of as a 'route* 
through a directory. An important Idea is 
that identifiers at different levels of the 
architecture referring to the same object 
must be bound together in contexts, 
statically or dynamically. Later they must 
be resolved using these contexts to locate 
the named otjects. 

There are Important system benefits If 
every bltfUe ID Is uiuque (Leach 82). Less 
obvious are the system-wide ramifications 
of the total naming system, especially the 
choice cf the particular mechanism used to 
create unique bltfile IDs and the 
mechanism to associate application- 
dependent. human-oriented names with 
them. Of the many goals and Implications 
of Identification schemes enumerated by 
(Watson 81b). the goals that are the most 
pertinent to the reference model are 
abstracted and discussed below. 

The naming system should: 

• Support at least two levels of identi- 
fiers. one convenient for people and 
one convenient for machines. The 
latter Is the bltfUe Identifier. The 
former will be handled by site or 
operating system specification of the 
name servers or by imbedding a name 
service In a hlg*'.er level file service. 

The separation of Identifier levels Is 
very Important because a storage sys- 
tem must be integrated with many 
types of hetcroge’^ecus applications 
and operating and storage systems 
(centralized and distributed), each 
supporting Its own form of human- 
oriented naming scheme. The reference 
model provides a clean separation of 
mechanism for these two levels of 
identifiers and allows their easy In- 


18 



tegrallon. (When the client is re- 
sponsible for the use of the bitAle ID, 
there is the potential to create lost 
objects in a system and thus mecha- 
nisms must also be included to assis 
the system in identifying them so that 
the resources they can be re- 
claimed.) 

• Support distributed generation of ma- 
chine-oriented. globally-unlque bitfUe 
Identlflers. A variety of mechanisms 
are available to support this need 
(Lrach 82. MuUender 84, Watson 81b). 
One mechanism is to include both a 
bitlUe server ID and a time stamp in 
the identifier. This structure, con- 
taining node or server boundary in- 
formation. is at most a hint to appU- 
catirns as to where to send access re- 
quests and should not restrict migra- 
tion. A machine-oriented identifier is 
a bit pattern easily manipulated and 
stored by machines and may be 
directly useful with protection, 
resource management, and other 
mechanisms. A human-oriented 
identifier, on the other hand, is 
generally a character string that is 
readable by humans and that has 
mnemonic value. Directory path 
names are a common mechanism 
(McLarty 84). 

• Provide a storage system viewed as a 
global space of identified objects rather 
than as a space of tdent'Jied host com- 
puters containing locally- identified 
objects. Similarly, ihe identification 
mechanism should be independent of 
the physical connect vHy or topology 
of the system. That is. the boundaries 
of physical components and the 
connection among them as a network, 
while technoioglcaliy and 
administratively real, are invisible in 
object identifiers. Further, an object's 
name should be independenl of client 
oi server location. Users should be able 
to discover or influence an object's 
location. 

• Support relocation of objects. The Im- 
plication here is that there be at least 
two lower levels of identifiers and that 
the mapping and birijing between them 
be dynamic. For example, bllflles are 
expected to nilgrale. Therefore, the 


bltnie IDs should not contain storage 
addresses, and there must be mecha- 
nisms for updating the appropriate 
contex* (e.g. directories and tables) 
when objects arc moved. 

• Support use of multiple copies of the 
same object. For example, a file may be 
cached on disks at one or more hosts, 
on staging disks, or it may be stored on 
an archival volume. If the value of the 
ol^ect is only going to be read or in- 
terrogated. one set of constraints is 
implied. If values are to be written or 
modified, tougher constraints must be 
Imposed to achieve consistency 
between the contents of the copies. 
Policies of enforcement of such 
constraints are handled using the basic 
locking services specified by the 
reference model. 

• Allow multiple local, user-defined 
(human-oriented) names for the same 
object by allowing multiple mappings 
of a given bitfile identifier within the 
services of one or more .iame servers. 

• Support two or more resources sharing 
a single instance of a storage object 
without identifier conflicts. 

• Minimise the number of Independent 
identification systems needed across 
and within architectural levels. 

3.3 Bitfiie Sexver 

A bitfile server (Falcone 38) handles the 
logical aspects of bitfiles that are 
phvsically stored in one or more storage 
servers of the storage system. As shown in 
Figure 2. the major components of a bitfile 
server are a bitfUe server request processor, 
a bitfile descriptor manager and Its 
descriptor table, a migration manager, a 
bitfile ID authenticator, and a space limit 
manager and Us space limit table. 

The bitfiie server accepts requests for 
service on bitfiles from the bitfile client, 
site managers, migration mana,'*T. and 
other bitfile servers. A discussion of the 
operations that bitfile clients can request 
of the bitfile server regarding the bitfile fol- 
lows. The function parameters are shown 
In Table 1. 


19 



SJbl Bttflle Server Commaadt 
Abort 

The client requests that a previous 
request be aborted. 

Create 

This request is used to establish a new 
entry in the bitflle server's descriptor 
table. The requestor must prove his 
rlj(ht to do so. and when this is estab- 


lished. he recet/es a new bltllle ID firocn 
the bitflle server, which may then be 
saved for use when the bitnie is 
accessed later. 

Destroy 

The client requests that a bitflle de- 
scriptor bt removed from the bitflle 
descriptor table. The space allocated to 
the blUlle within a storage server is 



Figure 2 

The Bitfile ^^rver 


deallocated and. if the medium can be 
rewritten, the storage server returns it 
to the free-space list. 

Ense 

This request erases data on a storage 
server with the specified erasure 
pattern. If only a segment is to be 
erased, then the client must specify the 
length of the segment and an optional 


offset field displacing the length into 
the bitflle. 

Lock 

The client requests that a bitflle be 
locked for read access or for read /write 
access. 


20 




Modify 

The client rec aesls a change in one or 
more aunbut.es of a bttfUe contained 
tn the bttfUe descriptor table. 

Qaezy 

The client requests Information about 
a bltfUe or a bitfUc's aUnbutes from 
the bitfile descriptor table. 

Retrieve 

The client requests that a bltflle or a 
segment of a bltflle be moved Grom the 
storage server median to the bltAle 
client or appUcaUon buffer. If only a 
segment Is to be moved, then the in- 
ternal starting bit address (afiset) and 
the bit string length of the segment to 
be transfened must be specli'ed. fThe 
data transfer Is on a separate logical, 
and perhaps physic?', pcth from the 
request/response pritli. the data block 
Itself is lUg part ctf ihc ; esponse.) 

Staftna 

The client requests the status of a 
previous request made by the client. 
Although the way in which status is 
Implemented is site dependent, gener- 
ally a transaction ID must be generated 
to support the status request. 

Store 

The client requests that a bitfUe data 
block be moved from a bitfile client or 
application buffer to the storage server 
ncdlum. This request may iiKlude the 
ability to append a segment at the end 
of an existing bitfile or to update a 
physically specified segment of an ex- 
isting bitfile. (The data block itself is 
not part of the request.) 

Unlock 

The client requests that any locks held 
be released. 

Ihe interface on which this list of 
commands can be sent is shown in Figure 1 
as bitfile client requests and bltflle server 
replies. 

The site manager will have additional 
privileged requests to control allocated 
space lirrilts. examine all bitfile directory 
fields, set access control and migration 
policy parameters, etc. 


3.3.2 Bttme Reqaect PiooMMr 

The bitfile server reqc rst processor accepts, 
parses, and executes req>iest messages bom: 

• The bitfile clients, to store, retrieve, 
and manage bltfiles. 

• the site manager, to provide 
monitoring and control. 

• the migration manager, to move 
bltfiles to other bitfile serrers. and 

• other bltflle servers, to support mi- 
gration. 

The request «or is therefore essen- 

tially the sequencer and controller of ac- 
tions within the bitfile server and t!;c in- 
terface to the other storage system modules. 
Requests must be scheduled to provide .be 
best possible resptonse to the bitfile clients 
whUe optimizing the use of the available 
resources. CUert-speclfied priority, bitfile 
size, and storage server availability may 
affect the request scheduling. Ihese actions 
require a significant amount of processing. 
In executing a request, the request pro- 
cessor may interact with the bltflle de- 
scriptor manager to retrieve, create, or 
update bitfile attribute information, and 
with a storage serv er to allocate or release 
logical volume space for bitfile storage and 
to store and retrieve bltfiles. To select a 
storage server when a bitfile Is created, the 
request processor must have information 
about the bitfile (bitfile size, the response 
desired, the protection and reliability de- 
sired. the type of storage desired, etc.) and 
must match this informatlor with the 
characteristics of the available storage 
servers. Bltflle clients might be able to 
specify a specific storage server or logical 
volume as well. 


21 



Tttdel 

Bitfile Server Fimctions 



Pnnineten 


ilbort 

Transaction ID 


Cicale 

(Initial kngth in bltsi 
(Maximum length m bits! 
(Attribute Value/Name pair list! 

Bitfile ID 

Destioj 

Bitfile ID 


Ense 

Bitfile ID 
(Ofbetl 
(Lengthl 
Erasure pattern 

Number of bits erased 

Lock 

Bitfile ID 
Lock type 


Modify 

Bitfile ID 

New attribute name/valut pairs 



Bitfile ID 

Attribute name list 

Attribute name/value pair list 

Retrieve 

Bitfile ID 

(Ofisetl 

(Lengthl 

Data transfer sink ID 

Number of bits transferred 

StatoB 

Transaction ID 

Transaction status 

Store 

Bitfile ID 

(Offseti 

(Length) 

Data transfer source ID 

Number of bits transferred 

Unlock 

Bitfile ID 



The request processor is responsible, using 
the bitfile ID authenticator, for the security 
and integrity of the access to bilfiles. and 
for synchronizing the sharing of bilfiles 
through its locking services. The bitfile 
request processor collects accounting data 
from all alTected sources regarding each 
transaction and sends them to the account 
service. The request processor also com- 
municates with the space limit manager to 
determine that the resources assigned to a 
particular account are not overdrawn. 


3.3.3 Bitfile Descriptor Manager and 
Descriptor Table 

State and attribute information for each 
bitfile is kept in records in a descriptor 
table. Each record is called a bujlle de- 
sedpior. A descriptor manager provides an 
interface for the request processor to store, 
retrieve, and update bitfile descriptors. 
Bitfile descriptors are accessed using a 
bitfile ID as a key which is assigned by the 
descriptor manager when the bitfile de- 
scriptor is created. 


22 






A convenient way to classify bltfUe de- 
scriptors Is by origin and usage. Typical 
bitflle descriptor classes and some 
examples are: 

• Cicated and naed *7 the bltfUe client. 

- comment 

- bitfile format 

• Created by the bltfUe client and used by 
the bttfile server. 

- access-control Infonnatlon 

- account ID 

- bltfUe lifetime 

- desired level of redundancy 

- family attribute 

- maximum bitAle length 

- priority 

- security level 

- service class 

- storage class 

- type of storage desired 

• Created by the bitfile server and used 
by both the bitfile server and the bltfUe 
client. 

- access statistics 

- accounting statistics 

- bitfile allocated length 

- bitfile ID 

- bitfile length 

- creation time 

• Created and used by the bitfile server. 

- last backup time 

- last migration time 

- location of backup copy 

- lock information 

- previous location 

• Created by the storage server and used 
by bitfile server. 

- last device to write bitfile 

- location of bitfile 

The importarice of descriptor tables neces- 
sitates that backup and recovery be sup- 
ported by the descriptor manager. 


as.4 Bitfile ID Anthenticator 

The bitfile ID authenticator Implements a 
mechanism, such as an access list or DES 
eiK'ryptlon used in a capability system, 
which protects the bitfUe ID from being 
forged. It may also enforce security pcdicy 
bas^ on the security level of the bitfile. the 
request message, or the client. The authen- 
ticator is called by the descriptor manager 
when the bitfile ID is created to support the 
authentication mechamsm and. when a re- 
quest for access to the bitfile is received, to 
authenticate the bitfile ID presented by the 
client. If the access control is via an access 
control list, an identifier of the accessing 
entity (principal ID) must accompany the 
request and 1^ checked against aii access 
list that is kept, at least logically, in the 
bitfile descriptor. If access control is via a 
capability sy^em. the bitfile ID may be en- 
crypted along with some redundant and ac- 
cess-right lnformat<an within the capa- 
bUity. and decrypted by the authenticator 
and compared against information in the 
descriptor whm the bitfile is accessed. It is 
assumed tliat the authentication module 
can be added by a site or systems integrator 
since access control mechanisms aiKl secu- 
rity policies are site-dependent (Jones 79b. 
Donnelley 80. Mullender 84). 

3.3JI Bflgration lianager 

No single storage server now available can 
provide both the performance and large 
capacity often needed by a large storage 
system. A successful strategy is for a 
number of bitfile servers and their asso- 
ciated storage servers to be operated as a 
storage hierarchy. 

A migration manager is associated with 
each bitfile server. The migration manager 
is responsible for maintaining enough free 
space on the logical volumes managed by its 
bitfile server (e.g. disk storage) to ensure 
that requests for new bitfiles can be 
honored. When the migration manager ini- 
tiates a migration procediire. it first queries 
the bitfile descriptor manager for 
Information about ail of the bitfiles that 
might be migrated. This Information might 
include the bitfile priority, size, locks, 
activity, idle time, and client-desired re- 
sponse. BltfUe clients may be given dif- 
ferent degrees of control, by various site 
management policies, over the placement 


23 



or their bitnies in the storage hierarchy. 
Using policy set by the site manager, the 
migration manager determines which blt- 
rUes should be moved. Finally, the migra- 
tion server sends requests to the bltnie 
server request processor to move the blt- 
filcs to a bltflle server “lower" in the 
storage hierarchy. (Bltfllcs move “up" In 
the hierarchy, toward higher-performance 
servers, as they are accessed: this move- 
ment Is orchestrated by the bltflle server 
request processor.) 

An alternate configuration may permit the 
migration manager to act as a third-party 
controller to initiate the request for a 
move. The separate request and data paths 
in the reference model allow data to move 
directly from a source storage server to a 
sink storage server, even though a third 
parly Initialed the transfer between the tv. 
bltnie servers. A request path may span two 
or more bltflle servers until the bltflle is 
Icoted. To Increase performance during 
retrieval. It may be desirable to establish a 
direct data transfer path, bypassing some 
storage servers, once the bltflle has been 
located. Such might be the case when 
bitnies are accessed on very rare occasions 
and It Is not economic to bring them back 
up the hierarchy. 

3.^6 Space Limit BAanagcr 

The space limit manager checks to see what 
logical volumes a given account, user, or 
user group Is allowed to use: It controls 
space allocations, number of bllflles al- 
lowed, or other policy parameters associ- 
ated with space resource management that 
a given site may wish to enforce. Tne space 
limit table has entries for each account or 
principal ID for maximum and current 
space and bitfile limits. 

3.4 Storage Server 

A storage server (Savage 88) may best be 
visualized as an intelligent storage con- 
troller and its suite of storage devices. A 
storage server consists of a physical storage 
system (containing the physical bltfile- 
stora.ge medium), a loglcal-lo-physlcal 
volume manager, a physical device 
manager. a means of command 
authentication (unless it is a trusied 
component of the storage control 
processor), and some part of or intimate 


connection to the bltflle mover. A diagram 
of the storage server is shown in Figure 3. 

Ibe abstract objects of the storage server 
that are visible to the bltflle server arc 
logical volumes and bit string segment de- 
scriptors. The descriptors of the space 
occupied by a bltflle form an ordered set of 
bit string segments Identified by descrip- 
tors. each of which contains the logical 
volume ID. the slanlng point of the segment 
on the logical volt4me, and the length of the 
segment. The bit string segment descriptors 
are created by the storage server and stored 
In the descriptor tables of the bltflle server. 

Each logical volume is considered to be a 
logical image of flawless media usable for 
storing bitfile data blocks, thus providing 
the separation of physical and logical 
space. Separation of logical and physical 
volumes supports segment relocation when 
media fall, v/here new storage devices are 
introduced, and when space utilization or 
transfer rate are optimized. Any media area 
that is unavailable for data storage because 
of flaw areas, formatting, control tracks, 
etc., is excluded from representation In the 
logical volume by the loglcal-lo-physlcai 
mapping function. 

The list of operations supported by the 
storage server Is listed in Table 2. The site 
manager has a number of prlv'leged oper- 
ations Including create, destroy, modify, 
and query of logical volumes, physical 
volumes, and physical devices. 

3.4.1 Physical Storage System 

A physical storage system consists of the 
devices used to read and write volumes and 
the drivers to control those devices (to po- 
sition heads properly in relation to the 
media before reading or writing, etc.). The 
available physical storage systems cover a 
broad spectrum of characteristics in terms 
of random or sequential access, rewritable 
or wrile-once media, capacity, and per- 
formance. 

3.4.2 Physical Device Manager 

The physical device manager 
communicates wilh drivers in the pnysical 
storage system to load, unload, and 
position media volumes (it is the bltflle 


24 



mover that controls the actual transfer of 
data). 

Physical device managers vary from simple 
modules associated with nxed-media de- 
vices, such as Winchester disks, to complex 
modules that deal with manually mounted 
volumes, as in systems with standard 
magnetic tape drives or automatically 
mounted volumes, such as in the IBM 3850 


and the STK 4400 .4uton<ated Cartridge 
Library. In automated systems, the 
physical device manager communicates 
with a physical volume repository to 
request that physical volumes be mounted. 
The physical device manager maintains a 
mounted volume table to optimize mount 
requests. It schedules and executes requests 
In a manner that attempts to give the 
desired response to its clients while at the 


BitfUe 

Server 

Site 

Manigef 



Figure 3 


The Storage Server 


25 




same time making the best use of the 
storage system and communication 
resources. For example, it may be desirable 
to give higher priority to transfers for 
which volumes are already mounted or to 
small bitflle transfers, to limit the number 
of concurrent large bitflle transfers, or to 
use a client-specified priority. 

3.4.3 Loglcal-to-Pliysical Volume 

BSanager 

The loglcal-to-physlcal volume manager 
maintains descriptors of attributes for each 
logical and physical volume and a set of ta- 
bles to permit mapping the bit string 
segment descriptors of the logical volumes 
onto physical space in one or more physical 
volumes. The bit string descriptors include 
volume serial number, starting point ad- 
dress. and attributes for each logical and 
physical volume. Examples of attributes 
arc creation time. size, security level, ^and 
physical volume attributes. 

The logical-to-physlcal volume manager 
understands the characteristics of the 
actual physical volumes used by the storage 
server. Its main functions are to allocate 
and deallocate space and to convert logical 
bit string segments to physical bit string 
segments for that bitflle. The loglcal-to- 
physical volume manager also ma Intalns a 
flaw map to map. for example, defective 
tracks on a magnetic disk to spare tracks 
(some device controllers maintain flaw 
maps, making duplicate maps in the 
volume manager unnecessary). Similarly, 
it maintains a map of disk tracks or 
magnetic tape block numbers that are used 
for control and formatting and that are 
thus unavailable for data storage. When 
data is moved within the storage system 
because errors start to occur or new 
physical devices or volumes are introduced, 
the map must be changed. 

A map of the free and used space is main- 
tained for each logical and physical 
volume. Space summary information for 
each volume may be retrieved to aid in the 
volume selection process. This volume 
information Is retained In the storage 
tables, which must have the same 
reliability and performance as the 
directory in the bitflle server, l.e., it must be 
backed up and recoverable or It must be 
possible to build the information from 


other records. All of this information is 
available to the site manager interface. 

3.5 Physical Volume Repositoiy 

The physical volume repository (Coleman 
38. Savage 8S). shown m Figure 4. stores 
physical volumes. It may ^ manual or 
mechanical. 

The physical volume repository is re- 
sponsible for managing the storage of 
media volumes and for mounting these 
volumes onto drives managed by the 
physical device manager. Volumes may be 
stored in an automated library that 
includes a robot capable of mounting the 
volumes or stored in a vault and mounted 
manually. 

The architecture of the physical volume 
repository is that of a server that manages 
abstract objects called ptiysical volumes. A 
physical volume consists of a media 
volume and a volume descriptor. (A 
physical volume is similar to a bitflle in 
that both include a resource and a resource 
descriptor.) The volume descriptor contains 
at least the following fields: 

• TTie current physical location of the 
media volume. The volume might be in 
a vault, in a storage cell of an 
automated device, mounted on a drive, 
or held by a robot. 

• A human-readable label by which an 
operator can identify the volume. 

• The media type. One phj^lcal volume 
repository might manage both 
magnetic and optical media, dlflerent 
varieties of magnetic tape. etc. 

• Information to identify the owner of 
the volume. 

• Access-control information to validate 
requests. In a capability-based system, 
thks information might be an encryp- 
tion key. In other systems, this in- 
formation might be a list of clients 
authorized to access the volume. 

• Various statistics associated with the 
volume, such as the number of times 


26 



the volume has been mounted and the 
time of the last mount. 

Associated with each physical volume is a 
volume identifier. This identlller. when 
included in a request, allows the physical 
volume repository to locate the descriptor 
for the media volume and. in a capability- 
based system, proves that the client is au- 
thorized to access the volume. The format 
of the volunie identifier is not specified by 
the reference model. If the medium is 
optical disk and only one side of a physical 
disk can be read at a time, there may be a 
unique volume Identifier associated with 
each side of a disk. 


The physical volume re'posltory maintains 
the volume descriptors on a storage device 
to which It has access, lire physic^ volume 
repository cannot maintain the volume 
descriptor on the volume Itself because: 

• The reference model does not specily 
the format of the data on a volume. In 
some Implementations, the physical 
device manager may be able to read 
volume labels (using a bltfile mover), 
but If unlabeled volumes are allowed, 
only the bltfile client or the ultimate 
application can interpret the contents 
of the volunie. 


Tdde?. 

Storage Server Functions 


Function Parameters 


Response 


SS- Allocate 


SS-Deallocate 


logical volume IDs new s^ment descriptors 

existing segment descriptors 
desired length 

existing segment descriptors 


SS-Retrieve 


SS-Store 


segment descriptors 
starting offset 
bit string length 
sink descriptor 

segment descriptors 
starting offset 
bit string length 
source descriptor 


number of bits transferred 


number of bits transferred 


• yo.-‘. tynes of archival media do not 
'update in place", preventing 
i' r physical volume repository from 
ViHintainlng dynamic information, 
such as the time of last access, on the 
volume Itself. Information on WORM 
optical disks, once written, cannot be 
modified. Some volumes, such as CD- 
ROMs, cannot be written at all. 

% One of the important pieces of infor- 
mation in the volume descriptor Is the 
physical location of the volume. One 


can hardly access the volume to 
determine where It is! 

The client interface consists of the opera- 
tions necessary to allow the physical device 
manager to mount and dismount volumes 
and to allow the site manager to query and 
change the state of the repository. The op- 
erations and parameters that are unique to 
the physical volume repository are listed in 
Table 3 and described in the following 
paragraphs. 


27 




PVR •Dequeue 

Any queued request for the specified 
volume with the specific wrlt^-protect 
mode that includes the specified drive 
as an acceptable drive is cancelled. 

The dequeue function is routinely used 
by the physical device manager to 
remove requests for manually 
mounted volumes. Even though the 
physical volume repository maintains 
the queue of requested volumes, the 
physical device manager may be the 
only module able to detect that a 
volume has been mounted on a drive 
not accessible to the physical volume 
repository. If an operator inserts a 
requested volume into an automated 


library, tht physical volume 
repository will mount the volume on 
an available drive: if the physical 
volume repository can Identify the 
volume by reading an external label, 
and a request for this volume is 
queued, the ph)rslcal volume repository 
will choose a drive acceptable for that 
request. Otherwise, the physical vol- 
ume repository will choose any drive 
capable of handling the volume. In any 
event, the physical volume reposltoiy 
will not remove the request from the 
queue until It receives a dequeue 
command from the physical device 
manager. 


Physkal Volume! 


Storage Server 


Site Mjiiiaga’ 


a 



Figure 4 

The Physical Volume Repository 


PVR-DUmount location selected by ihe physical 

The media volume on the specified volume repository, 

drive Is dismounted and stored in a 


28 




The volume Identifier is included in 
the dismount command to allow the 
physical volume repositoty to update 
Its records; if the physical volume 
repository has a mechanism tc 
identify the volume itself (by reading 
an external label), the volume 
Identifier serves to confirm the 
physical volume re; csltory's records 
and to detect anomalies. 

FVR-EJect 

The volume Is rem- d from the 
domain of the pi. .leal volume 
repository. The “reason" parameter is 
an optional string to be sent to the 
operator explaining why the volume Is 
being fleeted. 

In an automated system, the Eject 
command will prrbably result In the 
volume being oved to a port 
accessible by the operator. 


FVR4.ocate 

The PVR-Locate funciloii Is used to 
determine volume locations when, for 
example, an automated library has 
failed and volumes are being accessed 
manually. 


PVR-Mount 

A media volume Is mounted on <t urive. 
Volumes queued for manual mounting 
are displayed on an operator console If 
the physical volume repository con- 
trols such a device (remotely controlled 
consoles can use the PVR-ReadQueue 
command described below). Some 
physical volume .posltory imple- 
mentations may allow concurrent re- 
quests Ir. which no volumes of a group 
are mounted until all can be mounted, 
or requests with a choice of volumes. 


Tabled 

Ph3rsical Volume Repositosy Functions 


Function 

Parameters 

Response 

pyR-Dequeue 

volume ID 
write-protect mode 
drive ID 


FVR-Lilsmount 

volume ID 
drive ID 


PVR-Eject 

volume ID 
reason 


PVR-Locate 

volume ID 

current volume location 

PVR-Mount 

volume ID 
write-protect mode 
list of acceptable drives 

drive ID or queued for manual 
mount 

PVR-ReadQueue 

queue offset 
maximum number of 
entries to send 


PVR-ReadStatus 

device ID 

type of status desired 

device status 

PVR -Sets tatu* 

device ID 
type of status 
desired value 



29 





FVR-ReadQueae 

For each request queued for a manual 
mount, the volume identifier, list of 
cceptable drives, and the wrlle-protect 
mode are returned. 

Providing a queue offset and a 
maximum number of entiles to send In 
the PVR-ReadQueue command allows 
the client to receive only the number of 
entries that it can handle. This 
function also supports operator 
displays not under the control of the 
physical volume repository. 

PVR-SeadStatus 

The amount and type of status infor- 
mation is dependent upon the devices 
controlled by the physical volume 
repository and upon their configura- 
tion. Status information might Include 
the on-line status of the device, the 
volume Identifier of the volume 
mounted on the device, current or 
previous error information, configu- 
ration Information, etc. 

FVR-SetStatus 

The particular status values arc de- 
pendent on the devices controlled by 
the physical volume repository. This 
function Is used to bring devices on- 
line. take them olT-line. set diagnostic 
or manual modes, etc. 

3.6 Communication Service 

The communication service Includes the 
capability to communicate requc t 
messages as well as the bltf'le mover (Kitts 
88) for high-speed transfer of bitfile data 
blocks (Allen M). 

A bltflle mover is a set of modules that 
move data from one source/slnk to 
another. A storsige system includes at least 
two bitfile movers (Figure 1). one controlled 
by the bitfile client and the other controlled 
by the storage server. Additional movers 
may be required for more complex routing. 
Figure 5 shows the control and data paths 
necessary to move data from source to sink. 

A source or sink can be defined as: 

• A memory buffer, local or remote. 


• a ir^dla extent, such as on local or 
remote disk, or 

• a cha'^el interface connected to a de- 
vice. 

r..se definitions do not limit the methods 
of uata transport used by the bitfile mover 
or the ability to transform data during the 
move. Because the mover’s source and sink 
interfaces depend on the devi'^es. network 
interfaces, and network protocols used by 
the site, the reference model does not 
specify them. The bitfile mover’s control 
interface to the source or sink manager, 
however, can be specified. 

The Move operation supported by the mover 
Is shown In Table 4. 

The sou.ee and sink descriptors may de- 
scribe network Interfaces, buffer addresses, 
or device descriptors (device addresses, 
block Information, etc). One descriptor is 
sent by the bltflle client, the other is 
provided by the storage server. 

The transformation description may 
specify translation. compaction, 
compression, encryption, and/or check- 
summing to be performed by the mover. 

The site manager interface ca... through 
privileged commands interrogate ch?»mel 
status and other mover statistics. 

3.7 Site Manager 

Site management (Collins 88) is the col- 
lection of functions that are primarily 
concerned with the control. pe.*fomiance 
and utilization of the storage sy., t**m. These 
functions are often site-dependent, involve 
human decision making, and span multiple 
servers. The functions may be Implemented 
as stand-alone programs, may be Integrated 
with the other storage system software, or 
may be policy. 

Site management attempts to allocate the 
resources of the storage system to the best 
use for the overall benefit of the site 
Policies for the site must be set. and the 
manual and automatic procedures must be 
developed '.o Implement those policies. The 
procedures must be adaptable because the 
requirements will change as time pro- 


30 



Decs 


r 



Figures 
Hie Mover 


TbUc4 

Miyvsr Function 


FmcthiD 


Paxaiaetets 


RccponM 


More direction number of source bits moved 

source descriptor number of sink bits moved 

sink descriptor 
transformation descriptor 


gresses and because the same software may 
be run at a number cf different sites. 

For this discussion, the site managen ent 
sanctions are grouped into seven areas: 
storage management, operations, systems 
ma'ntcnance. software support, hardware 
support, administrative control, and bitl.le 
management. The format will be to state 
the requirements and then discuss the tools 


needed to satisfy these requirements for 
each area. 

3.7.1 Stonge Management 

3.7.1. 1 Requirements 

The storage management function is con- 
cerned with providing good performance 
and reliability for user storage and access 


31 






needs, while utilizing the storage servers in 
an cfilclent and cost*eficcltvc manner. The 
specific goals arc to optimize the overall 
performance of the storage servers, to 
control placement of bitlUes in the stoi age 
hierarchy, to maintain sufllci^nt free space 
in each storage server, to control frag- 
mentation of space on volumes, to add and 
delete volumes, to recover data from bad 
volumes, to implement data backup 
policies, to enforce space allocation 
policies, and to determine the need for 
equipment. Most of the activities should be 
automated to the extent that the task of the 
human system administrator is primarily 
one of monitoring summary re]>crts and 
using reports for planning purposes. 

3.7. 1.2 Tools Needed 

The key to storage management for any 
storage system Is the ability to gather and 
utilize information about the current state 
of the storage servers and statistics on their 
transaction histories. The total space, total 
free space, and distribution of free space on 
individual volumes define the state of a 
storage server. Information should be ex- 
tracted from the transaction log of each 
storage server to give the number of bltflle 
accesses, amount of data transmitted, av- 
erage and mean response times, average 
and mean data transl'er rates, and pe items 
of access by bltflle age. activity anc size. 
Performance monitor programs are nt^cd 
to provide Information such as the average 
wait and response times, resource utdiza- 
tion, demand and contention, and queue 
lengths for storage system components. 

The mlgradon marager component of the 
bltflle server is i_*c primary tool for Im- 
plementing storage management policies. 
The migration manager uses guidelines set 
by the system administrator as well a^' 
historical data and current state data to 
determine the amount of free space to keep 
available for each storage server, to decide 
what bllflles (by activity, size, etc.) should 
be stored on storage device, and to de- 
termine w .cfilcs to move within the 
hierarchy p active bitfiles readily 

accessible. 

The storage .system should have automatic 
fragmentation* control. Information about 
the amount of free space and allocated 
space can be used to detciTnlnc when to re- 


pack volumes. This function may be 
performed by the migration manager or by 
seme other storage server module. 

Programs to analyze, initialize and label 
volumes should be provided, along with 
storage server commands to add and delete 
volumes. Two distinct types of volumes ex- 
ist. Volumes like fixed magnetic disks are 
not demountable. aiKl are usually defined to 
the operating system at system generation 
time Demountable volumes such as tapes 
or optical disks are not sut^ect to these con- 
straints. 

Storage servers and physical volume 
repositories should manage their de- 
mountable volumes largely without human 
intervention. The only human activities 
involved are occasional monitoring and 
revision of control parameters and sup 
plying empty volumes to the storage server 
or physical volume repository w*hen 
needed. 

T\vo areas of storage maru^gement require 
the direct Imrolvcment of rnowledgeabl^ 
persons. The first is the recoveiy of data 
frOiii bad volumes. Programs are needed to 
anal^'zc. display and modify iriformation 
on a volume. <ind to copy the entire contents 
of a volume to another volume without 
changing bitfile locations. sKlpplng bad 
data which cannot be read after a number of 
retries. Data recovery may use the 
migration manager *o evacuate a bad 
volume by moving ividual bitfiles. The 
decision as to which type of recovery should 
be used in a pankrular case must be left to 
an exj>ci1enccd person. 

S 3 rstem planning also requires human in- 
volvement. As new products arc developed 
and old ones discontinued, changes in the 
storage servers are needed In addition, 
changes in the network or user 
environment may require changes In the 
storage management policy or 
implementation. Statistical Informailon 
may be used to decide when storage 
servcrs/dcvlccs should be enhanced, 
acquired and phased out. Changing 
patterns seen in usage information may be 
the best indicator that changes are needed 
In the policies of storage management. 


32 



3.7^ Opetmtian* 

3.7.2. 1 Requirements 

The cperailons functions are concerned 
with making sure that the storage system 
operates continuously and Insuring that 
user requests are being satisfied m a tL-tc!y 
and reliable manner. The system mu./t 
monitored and controlled to Ider.Uf^ c •* 
resolve problems, to load/unload vJ-Imc 
volumes, and to verify that iKe 
management Jobs have run correctly. 

3.7.2 2 Tools Needed 

An Intelligent operations control center 
spanning the complete storage system Is 
needed. Console dlspla)rs need to show 
active requests for each server, requests 
queued for each server, volumes mounted 
for each storage server, space summaiy 
lnform?.tloii (total number of volumes, 
number of empty volumes, free space, etc.) 
for each storage system In the hierarchy, 
resource status (processors, storage 
controllers. storage devices, 

communlcatlor. :inks. volumes, etc.}, and a 
special display for resources that are 
suspect or unavailable. Operator 

commands should be available for each 
server to restart or abort requests, and to 
set resources available or unavailable. 

Storage system log Information Is needed. 
Messages that require action such a:> 
volume mounts and error messages should 
go to a display and/or hard copy console 
All messages should be kept In a data base 
where they can be easily retrieved and 
displayed. 

JoD summary Information Is needed. 
Successful completion messages and error 
messages for ^tem jobs should be written 
tc a data base where they can be reviewed. 
When an Important system job falls, such 
as backup of the bi Jlle descriptor tables, an 
operator action message should be Issued. 

The operational means to recover from 
temporary and permanent failures is 
needed. This Includes the ability to isolate 
equipment which Is failing or needs pre- 
ventive maintenance (e.g. tape drives 
ncccMng cleaning) and the ability to switch 
to ba- ’mp equipment. 


Autmnatlon of operations is needed to max- 
imlze the performance and reliability, and 
to minimize th' lanual effort. This 
Includes autoiru. i of volume loading 
using a physical iume repository and 
automation of the decision-making process 
to minimize human errors and human 
delays. 

3.7.3 Systems Msfattenance 

3.7.3. 1 Requirements 

The systems maintenance functions strive 
to maintain the performance, reliability, 
and availability of the storage system, and 
the Integrity of the stored data. 
Performance Is supported by monitoring 
the indtvldu.ai components and devices as 
well as the overall storage for falling 
components or out-of-balance condlilons. 
The key to reliability and avaUablllty is 
the preservation of critical system 
inicrmation in an environment of possible 
hardware errors, software errors and 
system crashes. This Information includes 
name server directories, bitftle descriptor 
tables, space limit tables, physical volume 
tables, physical device descriptor tables, 
logical-to-physlcal maps, logical volume 
tables, network configuration tables and 
transaction logs. 

3.7.3 2 Tools Needed 

System pregrammers must have the ability 
to quickly make changes In operating 
system or stci'age system parameters that 
affect the performance of the system. These 
tuning parameters may be available at 
execution and/or compile time. A "super- 
user" capability Is needed so that a system 
prograrroner can execute al'. commands and 
have access to all system data. 

Tools to maintain the Integrity of Informa- 
tion are needed. Programs must be 
available to back-up bitflles and volumes, 
and to restore Information from the 
backups. Additional tools rr.usi be 
available for Important, dynamic tables 
and data bases where a backup quickly 
becouics out-of-date. One technique Is to 
keep a secondary copy of dynamic 
Infoimatlcn in addition to the primary 
copy. If either the primary or secondary 
falls, a new copy is Immediately made of 
(he good copy. Another technique is to keep 


33 



a Journal of the Important transactions. If 
a failure occurs, the journal is applied to a 
backup to restore the information to the 
current level. The recovery programs 
needed to restore a backup to the current 
level foil awing a crash must be available 
and well tested. Several persons should be 
familiar with the procedures required. 

A checkpoint capability is needed to restore 
critical storage system tables and data 
bases to a consistent state if a crash occurs 
during a transaction that makes multiple 
changes (such as saving a bltfile which 
makes a new bitllle descriptor, update?, the 
directory that points to the descriptor, and 
may update the accounting data bast: as 
well). During restart following an 
abnormal termination, the cneckpoints are 
used to either complete or back-off requests 
so that the tables and data bases will be 
consistent. 

Verlflcatlon programs are needed to check 
the consistency of storage s>’stem informa- 
tion. These programs should be designed to 
run in parallel with the system so that 
operation may continue while verlflcatlon 
is done. 

Tools to help with problem determine 
are needed. These irvdude trace capaL' 
breakpoint capabilities, selected prln>. 
of formatted and unformatted dump 
data and programs, and dump analysis 
programs. Tools are needed to modify and 
repair storage system information. 

S.7.4 Software Support 

3.7.4. 1 Requiremenis 

For sites that develop new storage system 
software, facilities must be available to 
develop, maintain and test that software. 

For customer sites, a test facility is re- 
quired to verify that a new v.'rsion satisfies 
local security and other requirements be- 
fore production use. 

3.7 4.2 Tools Needed 

An environment must be provided to test 
software changes and enhancements 
without disrupting the production use of the 
storage systerT. The ability to run a test 
version and the production version of the 


software simultaneously is necessary. The 
test software may run on the same 
processors as the production software or 
run on other processors. It may share 
devices such as the cocnmunlcation systems 
and the physical storage systems, but it 
should have its own tables, data bases, 
volumes, etc. Instead of running a complete 
test version of the stCH^e system sotbrare. 
a test version cf a particular component 
(e.g.. a bitllle server) could be run using 
components of the production system for 
the rest of the ^stem. 

A regression testing capability should be 
available so that a comprehensive set of 
tests can be ruu at any time against the 
productUm or tert system to verify securlfy. 
integrity, and f.erfonnance. Both the 
ruiuiing and checking of the regression 
tests should be automated. 


3.7.5 Hardware Supp or t 

3.7.5. 1 Requirements 

The hardware support functions are con- 
cerned with the display, diagnosis and 
correction of hardware problems, and the 
conflguiation and installation rf rhe 
hardware. Hardware failures and tl*r. lime 
needed to repair failures m- .s be 
minimized especially for those failures 
that brln^r down the storage system. It must 
be easy tc reconfigure the system hardware, 
and inslail and remove equipment. 

3.7.5.2 Tools Needed 

Programs to report hardware errors are 
needed. These programs should be able to 
give a detailed time history of hardware 
errors, and show correlated summaries of 
both temporary and permanent errors by 
error-t)rpe. device-type, specific dev'ice and 
volume, over specified time intervals. The 
ability to recognize the beginning of a 
problem before it becomes permanent is 
especially Important when dealing with 
storage devices/volumes where permanent 
errors generally mean the loss of data. 

Programs to exercise and diagnose all 
hardware components of the storage system 
are needed. These programs should be able 
to analyze the errors and recommend the 
corrective action. Storage devices with 


34 



mechanical parts, such as ma^etic disk. 
opUcal disk, magnetic tape, and especially 
physical vcdume repositories, have a much 
higher error rate than strictly electronic 
hardware so diagnostic and exercise pro- 
grams play an important role in storage 
^tems. 

The system should be redundantly config- 
ured so that components and paths can be 
isolated, removed for repair and upgraded 
with a miniTnal tmpiact upon operation and 
performance. Dynamic reconfiguration 
capabilities, mdudlng the switching of the 
production software to a backup processor, 
should be avallabk. 

S.7j6 Cimtiol 

3.7.6. 1 Requirements 

Administrative control covers the security, 
accounting, and management policies of 
the storage system. The security 
requirements are to implement the security 
policies of the site and to recognize if policy 
violations are being attempted. The 
accounting requirements are to gather 
usage information, to charge for use of 
resources, and to control the resources. The 
management requirements are to present 
summary information concerning the 
operation and performance of the ^stem 
that can be used to J*.»tlfy operational and 
equipment expenditures and to set high- 
level policy. 

3.7.6.2 Tools Needed 

The storage system must implement the 
particular security policies of each site by 
building the policies into the programs or 
through the use of replaceable modules. In 
general, the policies involve verification 
that a user has access to the requested re- 
sources of a server. Access or capability 
Information is stored with the resource and 
checked against similar information in the 
request. For some sites it is required that 
classification and partition levels be asso- 
ciated with bltfiles and requests, and that 
access be controlled based on certain clas- 
sification and partition rules. A security 
1(^ must be available that contains all 
security violations (as determined by a site) 
and all transactions above a specified 
security classification level. A log of all 


transactions should be xept to help 
diagnose arx>malous situaLons. 

The storage system needs a resource- 
charging mechanism. Charges may be in- 
curred for the following resources and 
services: amount of data stored, number of 
bltfiles stored, data transferred, bltfiles 
accessed, and any of the requests made to 
the bltfile servers. These charges may vary 
for the different bltfile servers, depending 
on the level of perfomumce and the class of 
storage used. Requests made to the storage 
system should contain an account code to 
which the charge is to be made. An account 
code can be stored in each bltfile descriptor 
along with Uie storage space used, the 
lerigth of time stored, and a rrference time 
for accounting purposes. An accounting 
program obtains the storage and bltfile 
charge Information from the bltfile 
descriptors; obtain the access, data transfer 
and request charge information from the 
transaction logs: accumulate and sort the 
charge information; and write the charge 
Information to an accounting file. Another 
accounting program has the resource 
charging rates and calculates the bills. 
Since the account codes often change, an 
automatic means of updating the bltfile 
descriptors Is needed. One approach is to 
have a central data base of accounts from 
which an accounting program can update 
the bltfile descriptors. This data base can 
also be used to ^^date users and to show 
what resources th^ can use. 

The summary information used by man- 
agement to set high-level policy needs to be 
extracted from all the other site manage- 
ment reports and data bases, and presented 
In a useful (usually graphic) manner. A 
number of vendor products are available 
that can be used to extract, process and 
display information. 


35 




Allen. I. D. (1983). The tole of Intelligent 
peripheral Interfaces in systems archi- 
tecture. Proc. Not Computer Corxf. pp. 623- 
630. 

Ahnes. G. T.. Black, P.. Lazowska. D.. and 
Noe. D. (1985). The Eklen system: a technical 
review. IEEE Irons, on Software 
Engineering SE-1 2.(1), 43-59. 

Blrrell. A. D.. Levin. R. Neddham. M.. and 
Schoeder D. (1982). Grapevine: an exercise 
In distributed computing. Comm. ACM. Vol 
25. No. 4. 260-274. 

Booch. G. (1986). Object-oriented devel- 
opment. IEEE Trans, on Software 
Engtneertrg, SE-12. (2). 211-221. 

Brownbrldge. D. R. Marshall. L. F.. and 
Randell. B. (1982). The newcastt. con- 
nection. Software Practice and Experience 
12. 1147-1162. 

Coleman. S. and Watson. R (1984). Storage 
In the LLNL Octopus network; an overview 
and reflections. Digest of Papers. Sixth 
IEEE Symposium on Mass Storage 
Systems. Vail. Colorado. June 1984. 

Coleman. S. (1988). Physical volume 
repository. Digest of Papers. Ninth IEEE 
Symposium on Mass Storage Systems. 
Monterey. Callfoimla. November 1988. 

Collins. B.. Dcvaney. M.. and Wilbanks. E. 
(1982). A network file storage system. 
Digest of Papers. Fifth IEEE Symposium on 
Mass Storage Systems. Boulder. Colorado. 
October 1982. 

Collins. B. (1988). Mass storage system 
reference model system minagement. 
Digest of Papers. Ninth IEEE Symposium 
on Mass Storage Systems. Monterey. 
California. November 1988. 

Davis. J. D. (1982). Mass storage systems: a 
current analysis. Digest of Papers. Flflh 
IEEE Symposium on Mass Storage 
Systems, Boulder. Colorado. October 1982. 


DIS8S71. Information processing systems 
open systems Interconnection, file 
transfer, access, and management (in four 
parts, draft international standard 
1SO/D1S8571). distributed by Omlcon 
Information Services. 

Donnelley. J. E.. and Fletcher. J. G. (1960). 
Resource access control in a netwe^k 
operating system. Proc. ACM Pacific 80 
Conf. 

Enslow. P. H.. Jr. (1978). What Is a 
'distributed* data processing system? 
Computer.Vol 11. No. l.Jan. 13-21. 

Falcone. Joseph R (1988). The blLlle server 
In the IEEE reference model lor mass 
storage systems. Digest of Papers. Ninth 
IEEE Symposium on Mass Storage 
Systems. Monterey. California. November 
1983. 

Fletcher. J.. G. (1975). Computer storage 
structure and utilization at a large scien- 
tific laboratory. Proc. IEEE 63 (8). 1 104- 
1113. 

Foglesong, Joy. et. al. (1990). The Livermore 
distributed storage system; 
implementation and experiences. Digest of 
Papers. Tenth IEEE Symposium on Mass 
Storage Systems. Monterey. California. 
May 1990. 

Gary. Mark (1990). Overcoming Unix kernel 
deficiencies in a portable, distributed 
storage system. Di^st of Papers. Tenth 
IEEE Symposium on Mass Storage 
Systems. Monterey. California. May 1990. 

Gentile. R. B.. and Lucas, J, R (1971). The 
TABLON mass storage network. Proc. 
Spring Joint Computer Conf. pp. 345-356. 

Grossman. C.P.(1989). Evolution of the 
DASD storage control, IBM Systems 
Journal. Vol.28. No.2, 1939. 

Harris. J, P.. Rhode, R. S.. and Arter, N. K. 
(1975). The IBM 3850 mass storage system: 
design aspects. Proc. IEEE 63 (8), 1171-1179. 




37 




Hogan. Carole, et. al. (1S90). The Livenno>e 
distributed storage system; requirements 
and overview. Digest of Papers. Tenth IEE£ 
Symposium on Mass Storage Systems. 
Monterey. California. May 1990. 

Howie. H. R. Jr. and Salbu. E. (1975). Mass 
storage implementation approaches; a 
spectrum. AFIPS The Information 
Technology Series. Memory and Storage 
Technology. 

Johnson. C. (1975). IBM 3850 mass storage 
system. AFIPS Cortf. Proc. 44. 

Jones. A. K. (1979). The object model; a 
conceptual tool for structuring software. 
"Operating Systems". Sprlnger-Verlag. 
Berlin. 

Kitts. D. (1988). BltfUe mover. Digest of 
Papers. Ninth IEEE Symposium on Mass 
Storage Systems. Monterey. California, 
November 1988. 

Kuehler. J. D. and Kerby. H. R (1966). A 
photographic mass storage s>'stem. AfTPS 
FJCC Proc. 29. 735-742. 

Laiitz. K. A., Ekllghoffer. J. L. and Hitson. B. 
L. (1985) Toward a Universal Directory 
Service. Report No. STAN-C5-85-1086. 
Stanford University. 

Leach, P. J.. et al (1982). UiDs as Internal 
names In a distributed file system. Proc. 
Symposium on Principles of Dist. 
Computing. Ottawa. 34-41. 

LeLaiui. G. (1981). Motivation, objectives, 
and characteristics of distributed systems. 
"Distributed system-architecture and im- 
plementation". Springer-Verlag, Berlin. 1- 
9. 

Mclarty. T.. Collins. B. and Devaney. M. 
(1984). A functional view of the Los Alamos 
central file system. Digest of Papers. Sixth 
IEEE Symposium on Mass Storage 
Systems, Vail. Colorado, June 1984. 

Miller, S. W. and Collins, B. (1985). Toward 
a reference model for mass storage systems. 
Computer, Vol. Ic. No. 7, July. 9-22. 


Miller. S. W. (1988a). "A Reference Model 
for Mass Storage Systems". Advcuvces in 
Computers, Volume 27, Academic Press. 

Miller. S. W. (1988b). Mass storage refereiKe 
model, special topics. Digest of Papers. 
Ninth IEEE Symposium on Mass Storage 
Systeriis. Monterey. California, November 
1988. 

Mullender. J.. and Tannenbaum. A. S. 
(1984). Protection and resource control in 
distributed operating systems. Computer 
Networks 8, 421-432. 

Nelson. M.. Kitts. D. L.. Merrill. J. H.. aivi 
Harano. G. (1987). The NCAR mass storage 
system. Dpest of Papers. Eighth IEEE 
Symposium on Mass Storage Systems. 
Tucson. Arizona. May 1987. pp. 12-20. 

O'Lear. B. T. and Choy. J. H. (1982). Software 
considerations in mass storage ^tems. 
Computer 15 (71. 36-44. 

Penny. S. J. and Alston-Gamjost, M. (1970). 
Design of a very large storage system. Proc. 
FallJoint Computer Conf. pp. 45-51. 

Sandberg. R (1985). Design and imple- 
mentation of the SUN network file system. 
Proc. Tmth Usenbc Conference, 1 19-130. 

Savage. P. (1985). Proposed guidelines for 
an automated cartridge repository. 
Computer, Vol 18. No. 7, Ju$^. 49-M. 

Savage. P. (1988). Storage server as physical 
box. Digest of Papers. N.nth IEEE 
Symposium on Mass Storage Systems. 
Monterey. California, November 1988. 

Svobodova, L. (1984j. File servers for 
network-based distributed systems. 
Computing Surveys. 16. (4), 354-398. 

Watson. R W. (1980). Network architecture 
design for a back-end storage network. 
Computer. Vol 13. No. 2, Feb. 32-48. 

Watson. R. W. (1981a). Distributed system 
architecture model. Distributed Systems — 
Architecture and Implementation, 
Sprlnger-Verlag. Berlin. 10-43. 


38 



Watson, R. W. (1981b). Identifiers (naming) 
in distributed systems. Distributed Sys- 
tems— Architecture and Implementation. 
Sprlnger-Verlag, Berlin. 191-210. 

Watson. R W. (1984). Requirements and 
overview of the UNCS distributed operating 
system architecture. Lawrence Livermore 
National Laboratory. Preprint UCRL- 
90906. 

Watson. R W. (1987). Tiitorlal notes. Eighth 
IEEE Symposium on Mass Storage 
Systems. Tucson. Arizona, May 1987. 

Watson. R W. (1988). Tlie Architecture of 
Future Operating Sy^ems. UCRL Preprint 
99896. presented at the Cray Users Group 
Meeting. Tol^. Japan - September 1988. 

Wlldman. M. (1975). Terabit memory 
system; design history. Proc. IEEE 63 (8). 
1160-1165. 


39 




5. Glossaiy 


AnthentlcatlcmReqiiest/nei^jr 

The command to test the access rights 
of the requestor to a particular service. 

Bitflle 

A collection of data that can be created 
on. read from, written Into, and deleted 
from a storage system. These data are 
treated as a string of bits without any 
particular structure. 

Bitflle Authenticator 

The process that checks the access 
rights of a requestor for service. 

Bitflle Descriptor 

The bitflle attribute information that 
Is stored as an entry in the bitflle de- 
scriptor table. 

Bitflle Descriptor Manager 

The process that manages the bitflle 
descriptor table. 

Bitflle Descriptor Table 

The data store where the bitflle de- 
scriptors are stored. 

Bitflle m 

A machine -oriented, globally unique 
identifler of a bitflle. 

Bitflle Mover 

The processes, including the high-level 
protocols, that control the movement 
of bitflles. 

Bitflle Server 

The set of processes that control the 
creaUon. destruction, and access to the 
many bitflles under Its control. 

Bitflle Server Request Processor 

The portion of the bitflle server that 
acts upon requests and controls the 
request/rcply communications with 
Intemcl modules as well as other 
processes and servers. 

Bitflle Transfer 

The high-speed novement of bitflle 
data blocks. 


CScnt Request/llei^jr 

The list of permitted commands from a 
client to a server and the resulting 
responses 

Create 

The bitflle client request t~> form a 
bitflle descriptor record in the bitflle 
descriptor table. The bitflle attributes 
to be contained In the bitflle descriptor 
are specified In the request. 

Destroy 

The bitflle client request to remove a 
bitflle descriptor from the bitflle de- 
scriptor table. The space allocated to 
the bitflle is deallocated and. if the 
media can be rewritten, returned to tt.: 
free space list. 

Lock 

The bitflle client request to establish a 
lock for a bitflle in preparation for one 
or more stores or retrieves of the bit- 
file. 

Modify 

The bitflle client request to change one 
or more attributes of a bitflle as con- 
tained in the bitflle descriptor table. 

Monitor Information 

Status Information from storage 
system modules used by the site 
manager to assist in the management 
and control of the storage system. 

Move Command 

The request to move a bitflle between 
specified devices. 

Name Server 

The server that converts between the 
human-orlent-d name for a bitflle and 
the machine-oriented name for the 
same bitflle. 

Physical Volume 

A bounded unit of storage media that is 
used to store bitflles. 




Physical Voliime Move 

The physical movement of a volume 
between the volume repository and a 
stc'age server or Its return. 

Physical Volvme Reposltoiy 

The place where physical volumes are 
stor^ when they are not at a read/ 
write station. 

Principle ID 

Identification of the agent requesting 
service from the bltlUe server. 

PVR-Diamount 

A request sent to the physical volume 
repository to remove a physical 
volume from a drive. 

PVR-Mfnmt 

A physical volume ID sent to the 
physical volume repository with the 
request, to mount It on a particular 
storage device in the storage server. 

The biiflle client request to obtain in- 
formation about a bitfile or its at- 
tributes from the bitfile descriptor 
table. 

Retrieve 

The bitfile client request to move a 
bitfile or a segment of a bitfile from a 
storage server to foe bitfile client. If 
only a segment Is to be moved, then the 
internal starting bit address and the 
bit string length must be specified. 

Site Control 

Commands from the site manager for 
Initial set up. operations, and man- 
agement of the storage systera. 

8S- Allocate 

The request to a storage servi;r to make 
logical space available f)r bitfile 
storage. 

SS-Deallocate 

The request to a storage server to 
remove a bitfile from physical storage 
and return the space to the fiee space 
list. 


•tS-Retricve 

The request from a bitfile server to 
noove a bitfile from a storage server to a 
bitfile client. 

8S-Store 

The request from a bitfile server to 
move a blt/de from bitfile client buffer 
to storage server media. 

Statua 

The bitfile client request for the status 
of the bitfile server or of a previous 
request made by the bitfile cLent. 

Stoee 

The client request to move a bitfile 
data block from the bitfile client to a 
bitfile server medium. This request 
may include the ability to append a 
segment at the end of an existing bitfile 
or to update a physically specified 
segment of an existing bitfile. 

Unlock 

The client request to release the lock 
held on a bitfile. 


42 




IEEE Ml ass storage 
System Reference Model 



1992 Conference on Mass 
Storage Systems and 

Technologies 

Septemi^r 22, 1192 


UML^Lm 


fsiiH224m 


I Lawrenct Livermore 
National Laboratory 



GocWardSC 9 22 92 i 


^ Agenda 



Overview of the Reference Model 
Abstract Objects 
Description of the Model 
Bitflle Client 
Name Server 
Bitflle Server 
Storage Server 
Physical Volume Repository 
Communication Service 
Mover 

Site Management 


of C^Mornm 

“1 Lawrence Livermore 
National Laboratory 



43 


Goddn; SC 9 22 92 2 


Overview of the Reference Model 



; Communications tool for discussing storage 

• systems 

• 

• Proper modularity 

• Transparency 

; Client interfaces 

I Take advantage of existing expertise 
I Reduce duplication of effort among system developers 

• Encourage mutualty-compatibte commercial products 

• Eocus on distributed, high-performance storage systems 


Unhmity &f Csltfemig 

I Lawrence Livermore 
National Laboratory 


44 


Goddard SC 9 22 92 4 



History of the Model 


First specialists workshop - September, 1983 

A proposed model presented at the 6th IEEE 
Symposium - June, 1984 

A refined model presented at the 7th Symposium - 

I Noyembe I : : : : : 

Session devoted to the model at the 9th Symposium 
October, 1988 

Working Group organized September, 1989 

Project 1244 kicked off at the 10th Symposium • 
May, 1990 


1 1 1 H| Lawrence Livermore 
National Laboratory 


Goddartf SC 9'22.92 5 




The First Reference Model (June, ’84) 



ft 

# 

ft 

# 



MSS teftsso# 


MSS fm Smm 


mmrmty of CMiifmwB 

n Lawrence Livermore 
National Laboratory 



45 


Go<!«!-a SC 9 22 92 « 





m 


Abstract Objects 


U¥«rmor® 
Natioiiai Laboratory 



GtMLf4sc vam t 


► 


Oeflnitions 


Abstractie^: 



A simplified description of a system that 

emphasizes some of the system's details 
while suppressing others. 

M. Shaw 


An entity whose l^havior Is characterized by 
the actions that it suffers and that it requires 
of other objects. 

G. Booch 


\ Lawrence Livermore 
National Laboratory 



47 













Common Abstract Object Fynctlons 


Create 

Creates a descriptor land mayl^ an obfectl 
and retyms 10 

Cfestioy 

l^stroys the object descriptor and releases object 
resoy roes 

Read Descriptor 

Interrogates d^criptor entries 

MiKilfy S^crlptor 

Changes descriptor entries 



“ I LMwtmm U'venwjre 

Ntetionii LaiK^ratory 



Gamm SC mzm i $ 


► 



« 

# 

e 

m 

m 

m 

m 


Description of the Modei 


iMmmify of C^ifymm 

“ ' “1 Lawenc# Uvarmort 
National LalKiratory 



GMmm SC « 22 92 1 s 



fill IMmtmm 

Li3 ttetkwirt i^oratory 


Oamm SC »22 «2 i ? 


^ The Purpose of the Bitfile Client 



The Bitfile Clfent Is the programmatic 
agent of the yser 

Converts user desires PPC, system, or 
iiiirsry calls) into bitfile server, name 
»rver, and mover requests. 

Allows the same servers to handle a 
variety of standf rd user Interfaces. 

The Bitfile Client Is system-dependent! 


Ummrnty of OMfmruM 

1 1 1 ■ Lawrence Li varmore 
yS National Laboratory 


(immfa SC 9 22 S2 < i 





• Provider 

‘ Transiat 

• machii 

: 

• Decoupl 
. manage 

• Can 

m 

• Supi 

• Manage 


Lawrence Lh 

National Lab 






»y 6 fC§Uaim 

\ijmpmm Uvmtmm 
Nitiofial 


GockHtuta sc 9i1iZm 2 3 


Specific Name Server Functions 


• Add Entry 

• Inserts a f'ame / ID pair Into a directory 

• Fetch Entry 

I Given a name, returns the ID from an entry 

• 

■ ■ ■ ■ ■m,' ■ ■ Mttim ■ ■ ■'m,' ■■■■■■■■■■■ 

. Delete Entry 

m 

• Removes one or more entries from a directory 

« 

• List 



• Returns some or all of the names In a directory 



^ 0f OuMfmtiB 

Lawrence Livermore 
fJallontl Lihoratory 




G0«l«SC 9^22 92 2 ^ 












Creating a Directory *Tree” 


^lEEE 


Stuff ' 

LL 


9®S1 



1 

WN«] 


lE^ 


t 


^ Wmmm 

- 

mm 

- 





iiiMliiiii 


UMHM 

aH 

igWMiU 

- 


1\ Mimmnm Wtmmm 
Nitloital Lalmmtory 


MMimi 

a 


- 


- 



Oodwwasc vmat^ 


► 


Path Names 



• *«r.*^*w 


fiOOf 









Server 



Name 

Server 




BItflle 
: Server:; 


BitfHe 


Dt¥i€t 


Migration 




Mmiwitf 0f C^fomki 

1 1 1 ■ Lawrence Livermore 
1^3 National Laboratory 


Biiftle 

Server 


Clkivice 


GetmtUSC « 2.«29 


^ Problems with Independent Servers 



Lost objects 

Dangling pointers 

Performance is lower when bitflle 
server access is necessary 

Encryption of IDs may be necessary 


Mmrmty CMlifmniM 

1 1 1 ■ Lawrence Livermore 
L>3I National Laboratory 


57 


OecJetonSSC 9^22 92 30 



Different Naming Environments 



. Unix 

• Untfructurtd names 

# 

• Dos 

• Name + extension 

• VMS 

• Version numbers 


pM* ® 


Ail can ure the same Bitfile Servers! 


I { I ■ CawrenoE! U vermore 
L^ NatlofMit Lalmratory 


GoddanJ SC 9WSZ 3 1 


^ Using a Database as a Name Server 







Space Limit Table 


(Mkmnitfty of QtUfomk 

1 1 m Lawrence Livermore 
L3 National Laboratory 


Bitfile Descriptor Fields 


Created vid used by Bitfile Client 

Comment, bitfile format 

Create by Bitfile Client, used by Bitfile Server 
Account, lifetime, security level, service desired ... 
Created by Bitfile Server, used by both 
Statistics, bitfile length, creation time ... 

Created and used by Bitfile Server 

Lock Information, last migration time ... 

Created by Storage Server, used by Bitfile Server 
Bitfile location, last device used 


Lawn 

Natloi 


^ INXMR QUAUW 


► Bitfile Server Functions 


Transaction Functions 

Abort, Statys 

Descriptor Functtons 

Create, Destroy, Lock, Modify, 

Bitfile Access Fun ^tlons 

Erase, Hetrle¥#, Store 


|||P|Lai'MNreNfiica LiMmfmm 
k.-'ii Matlimal ijiiKMnitory 


QemmX S. 22 S2 3I 


^ The Storaae Server 




flS^|issisE 

pSlfpifti 

limgmgi. 


'fiiiiiiFS-l 


ymumm 


1 1 1 fi| iawrtnee Livermore 
Nations! Laboratory 














ll*U«ff«iic. Uvwmoi. 
LS=J M«tlonBl Laboratory 


Ooasiw S€ 4 1 


Storage Server Functions 



SS-Atlocate 

AlloMte spa« and return ^gment descriptors 

SS-Deallocate 

DealltKjate space and return It to free list 

SS-Retrieve 

Transfer specified data to client 


Liwrtfice Livermore 
iINtotlonal Laboratory 


OilGlNM. PAGE IS 
m, PW. QtlALfW 


SC f 4t 





















ChcrmI, 
itifINir ^ 


rnmmifbttrmJmtma 
: MiBBi, sm^mmkm, *ic4 





CMwia>,<K 

ftiiffewr 


I • I Uvermore 
National Ltlioratory 


mmm sc mm 2 &% 


^ Mover Function 


Move 

Move data from specified source to sink 
Return the number of bits moved 


awrence Livermore 


OP fOOR QUAurv 













iirnmiitf <oi OMm^ 

|||■L8wr0nce Uvarmort 
National Laboratory 


^ The Purpose of the Site Manager 


Monitors operations 
Collects statistics 

Establishes policy 

Exerts control over policy and site operation 
Set policy parameters 
- Instills logica 
Run diagnosti 


I i ■ Lawrenco Livarmora 


Site Manager Functions 


• storage Management 

• Optimizes devic«3 and performance 

I Operations 

• Mbnltors systems and resolves problems 

• Systems Maintenance 

• Maintains system component performance and reliability 

• Software Support 

• Maintains development and test environment 

• Hardware Support 

• Displays, diagnoses, and corrects hardware failures 

» Administrative Control 

Maintiins security, accounting, manaflement policies 


or 

Lawrence Livermore 
National Laboratory 


QoddamiBC S/22«59 


N93*80451 


Si --,^ 2 - 


Optical Media Standards for Industry 


: : 


Kenneth J. Hsllam 
£NDL Associates 
29112 Coontiy Hills Road 
San Joan Capistrano, CA 92675 



Optical storage is a new and growing area oi technology that can serve to meet some oi the mass 
storage needs of the computer industry. Optical storage is characterized by information being 
stored and retrieved by means of diode lasers. When most people refer to optical storage, they 
mean rotating disk me^. '''ut there are 1 or 2 products that use lasers to read and write to tape. 

Optical media also usually means removable media. Because of its removability, there is a 
recognized need for standardization, both of the media and of the recorolng method. 

Industry standards can come about in one or more different ways. 

1. An industry supported body can sanction and publish a formal standard. Examples 
of such bo^es include ANSI. AIIM. ECMA and ISO. 

2. A company may ship enough of a product that it so dominates an application or 

industry that it acquires 'standard' status without an official sanctirr^ Such de 
facto standards are almost alwajrs copied by other companies with ■'vces 

of success. 

3. A governmental body can issue a rule or law that requires confcTDo' t- 
standard. The standard may have been created by the govemmcM. or adopted ik.. 
among mary proposed by industry. These are often known as de Jure standi’ 

Standards are either open or proprietary. If approved by a government or sanctioning body, the 
standard is open. A de facto standard may be either open or proprietary. 

Optical media is too new to have de facto st : ndaras accepted by the marketplace yet. The 
proliferation of non-compatible media types in the last 5 years of optical market development 
have conv-.nce 1 many of the need for recognized media standards. 

There are 3 organizations presently working to establish recognized standards for optical 
media. 

ANSI - The American National Standards Institute, committee X3BL1 
ECMA - The European Computer Manufacturers Association, committee TC31 
ISO - The International Standards Organization, committee ^23 and SC 15 

Membership in ANSI is open to individuals, organizations and companies. 

Membership In ECMA is open to compames that manufacture products in Europe. 

Membership m ISO is open only to countries. 

All work on the technical committees of all 3 organizations is accomplished by volunteers 
from industry The volunteers pay their own way and spend the time necessary to formulate, 
draft and critique proposed standards. 

As might be expected, many of the same individuals can be seen on the 3 different optical 
committees. The manufacturers of optical media and drives dominate the 3 committees 
devoted to optical media. There are a sprinkling of others, including semiconductor 


73 



'lanufactureis. software fi.’.'ms. CPU r ->niparJ<^ and user oigantzatlons. The user communtQr 
» probabfy the most under-represented at these coimntttees. 

AH 3 conunittees work haitl to keep some lewd of coordination between each other. The ANSI 
committee X3B11 Is the place that US optk^ media standards are devekq|>ed and the 
recommendations for a L'S national position at ISO are formulated. X3B1 1 alM nominates 
delegates for the US to send to the meetings of ISO-SC23. 

Because ECMA is a Euro|;ean Iraoe association, there are no direct, formal links between It and 
X3B1 1. However, there are fonnal auj long established links bet w e en X3B1 1 and 130/SC23. 
Since mai^ of the companies that send people to the meetings ' X3B1 1 are also mendieis of 
ECMA TC31. the informal communication path is well used. ECMA is also an advisory 
meaiber of ISO/SC23 and often makes direct submissions to ISO with proposed mte'>ational 
standards. 

When the members of SC23 and X3B1 1 can agree on all points of a proposed standard. It is 
possibie to have a Joint ANSI /ISO standard, published as a stn^ documenL However, there are 
often manv reasons for divergence within a standard, so we wind up wfth both an ISC and an 
ANSI standard for the same ihing. 

The members of optical media standards commlUess are worldiig under a handicap In that 
mu<di of the technical data necessary to develop a comprehenstve draft is still being discovered 
in the laboratory. The fundamental first order physics of optical receding is not as well 
imderstood as thatof traditional magnetic recording. 

The true test of a proposed standard physical par^ — ter or measuremert is can several people, 
workhig in dlfiert^ laboratorlea achieve the sa -ilts consistently? It takes tremeiKlous 

resources to define and develop a standard a out the tin^. materials and equipment 

generously provided py the vnl: ; .teers from ktdustiy. it would ijct be possible 

In spite of the amount of detail present in a media titandaia. people oftm ask why the 
documents don't go ever, farther in defining what a piece of standard media looks U^? A. 
standard is not intended as a complete Llu< ' hit on how to build a product, nor as a purchase 
specification. Instead. It exists as a firamework under which it Is possible for multiple 
manufacturers to build products that have a good chance of true interchange. 

Issues Hiw quality and ma. .ufocturlng process are best left to the market place. Ihere must be 
room within a star-daid for companies to add value and ofier their own vision of what the 
cnistomer neeos. 


74 



ExlsUt^ standards for optical media: 


GD4K»I 

■08000 This standard defines the Ble structures on a CD-ROM. 

•Capacity = approx. 5SCN4B 

BO 10148 Defines the physical media and the sector format of CD-ROM. 


■00171 Defines a common IdOrnm. ^.25 Inch) optical media, cartildse and 

sector foi. at for 2 independent and noa-compatlble servo qrstems. 
(CCS and SSH. Capacity = 320MB/skfc 

■O 11800 Defines 130mm media that uses magneto-optic recording but is 

treated as WCXIM media. Abo uses the CCS formaL Ci.pacity « apptux. 
320MB/skk 

BO IdOOO Defines 356mm. (14 inch) optical media and cartridge, as well as the 

sector format. Capacity = 3.40(n4B/8lde 

AMS ZS. 191-1881 Defines 130mm WCXIM optical media and cartridge that uses the SSF 

format and RZ recording. Capacity = approx. 650MB/side 

ANS 83.211-1888 Defines 130mm WORM optical media and cartridge tha. uses the CCS 

format and RLL 2.7 recording. Capacity = approx. 32(HdB/side 

AMS X3.214-1888 Defines i30mm WORM optical media and cartridge that uses the SSF 

fmmat and 4/15 recording. Capacity = approx. 32(X4B/side 

AMSZS.20O-f883 Defines 35Gmm WORM optical media and cartridge and sector 
f wmat. Capacity = approx. 3.400MB/side 

ANBI15.2a0'1882 Defines 130mm media that uses magneto-optic recorduig but Is 
treated as WORM media. Abo uses the CCS formaL Capacity = apjnox. 
320MB/s!de 


Optical Rewritable. (Erwafale) C o m pu t er Media 

■O 10008 Defines a common 13Chnm Rewritable media and cartridge with 2 

not xnpatlble servo formats. (CCS and SSF). Capacity = approx. 
320MB/side 

BO 1008? Defines 90mm R;-vritablc/Read-Only optical media and cartridge 

that uses Jie CCS format. Already approved by ISO. thb standard Is 
being considered for submission as an ANSI standard. Capacity = 
approx. 128MB. (single sided). 

AMSl X3.312-1808 Defines 130mm Rewritable optical media and cartridge that uses the 
CCS fcrmat. Capacity = approx. 320MB/side 

ANSI XS.21S- 1962 Defines 90mm Rewritable/Read-Onty optical media and cartridge 
hat uses the DBF fc.Tnat. CapacU. = approx. .20MB. fslngle sided). 


75 



I90mm mrwAeA Capadtj Media. A simple eicteiision of standards 10069 and X3.212 with 
650MB/side capacity and a ^oal of complete compatlblltty with first i(eneration media at 
the drive level. Also known as the 2X media project. Projects active at ECMA and ISO/SC23. 
Eat. compfetlon 6/93 


ISCiauB Beroad Gc a er a tl oa Media. AziUc4>ated capacity ia over 1GB per side. Also known as 
3X media project It Is active at ISO/SC23 X3B1 1. EsL completion 10/94 

jua Becoad Geaexatloa Media. Anticipated capacity is over 30(HdB on single sided 
media. Goal Is to offer compatibility with first generation media at the drive level. Est 
completlan 10/94 

MCMOf Media. This Is actually 2 projects et the pr esen t dme. There is no agreement 
on a single s er v o method, (between CCS a A SSF) and there nay be separate standards for 
plastic and glass media as welL Projects active at ISO/SC23 and X3B1 1. Est completion 
6/94 

File aad VolaaM Straetare. This pr;:4cct is intended to affect aD sizes of optical media It 
defines the software volume and file structure that Is needed to achieve true media 
interchaoge between systems. It has been developed with the needs of L MDC. MS-DOS and 
DEX operating systems In mind. It Is possibie th^ by the time of this conference, the draft 
win have been approved as a standard Ity ISO or ECMA The work is essentially complete 
and only a review and final votes are needed at both ANSI and ISO. There are a. Live orqjects 
at ^/SCIS. X3B1 1 and EXMATC15. EsL completion 12/92 


The next generailon standards for 90mm and 130mm Rewritable media will not Ukety go 
beyond the stated capacities because of the lack of commercial laser diodes with short wave 
lengths and sufildent power for this type of appUcatloa. The first generation optical media 
products had the benefit of a popular consumer appUcation. (the music CE>s) that caused hlgu 
production volumes In 780nm wave length laser diodes. These high volumes led to low cost 
compcHients and the boost in power needed for WORM or Rewritable drives was not too dlfllcuH 
for the diode manulsKrturers to make. 

To obtain 'ignlflcant irKreases In bit densities on optical disks, you must write a smaller spot 
on the m.L.dla. The only way to oo that economical^ today is with laser diodes with a shorter 
wave length. Laser diodes of at least 67 )nm wave lengths and power outputs of 30-40mW a e 
needed. There 's no consumer appUnuLon at present for laser dkxles with these characteristics. 
This leaves prices for such diodes In the range of several himdred dollars each because of the 
low manufacturing '-olumes. 

The d'a'arul for optical disk drives for computer applications is simply not high enough to 
justliy 1 laser diode manufacturer investing in a lot of capital equipment to make comp<.iients 
fm thicoe drives when other opportunltlc* are present. 

Ti.ls situ, tty i may change as consumer applications open up for shorter wave length diodes. 
Ar.-. :het x^ossiblllty is the use of ’ trick'' optics tliat act as frequency doublers. Double the light 
frequency and halve the wave length. The only problem is that at present, these techrdques are 
lujt very efficient. The < . iglnai powci- level is reduced by as much cs 90%. 

So. given *2\e existing limits of 'chnolo^' rn'ist .aptical storage media can be expected to go to 
2X or 3X over present capacity limit. .» the next few years. These capa..ity gains are 
achieved by a combination of code techiuque^ and track de-'isity improvements. 


76 



N93-®0452 

Techmdocr for Ifattonal Asset Storage ^fstems 

BabcrtACofae 

BuryBdJea 

IBBf Fiedenl 8!7¥teaM Campuj 
S700 Bij ilTM aifO^ Hooseon. TX 77088 

RIdiardWataiMi 

Lawienoe UrenacHC Nstioiial LdMntOijr 
P.O. Bqk 808. Uvcmme. CA M8BO 






Abstract 


An industiy'led collaborative project, called the National Storage Laboratory, has been 
organized to investigate technologjr fcr storage systems that will be the future reposltoiles for 
cur national information assets. Industry participants are IBM Federal Systems Company. 
Ampex Recording Systems Corporation. General Atomics DISCOS Division. IBM ADSTAR. 
Maximum Strategy Corporation. Network Systems Corporation, aiul Zitel Corporation. 
Industry members of the collaborative project are funding their own participation. Lawrence 
Livermore National Laboratexy through its National Energy Research Supercomputer Center 
(NERSC) will participate in the project as the operational site and the provider of appUcaUons. 
The expected result is an evaluation of a high performarxx storage architecture assembled 
from commercial^ available hardware and software, with jme software enhancements to 
meet the project's goals. It is anticipated that the Int^rated testbed system will represent a 
significant advance in the technology for distributed storage systems capable of handling 
gigal^e class files at gig'.bit-per-second data rates. The National Storage Laboratory was 
officially launched on May 27. 1892. when executives of the six founding industrial 
participants and John H. Nuckolls. Director of Lawrence Livermore National Laboratory, 
signed an agreement with Admiral James D. Watkins. Secretary. U. S. Department of Energy. 

1. The High Performance Data Storage Envirosunent 

In recent years, transferring and storing information has become a major challenge in the 
high perfoimance computing arena. Scientists at Livermoic and other research labs nov' wait 
for hours - and even days • to retrieve their supercomputer data. While today's storage systems 
can move ten to twenty million bits of information each second, requirements exist today for 
architectures with capacity of 100 million to one billion ’„lts per second. 

Elxcellent research and development Is taking place In the processci-?. communications 
netwoiks. and media required to handle very large volumes of data at very high data rates. 

• Several gigabit ciasi fiber optic networiis are planned that enable cross-continent access to 
high rate/high volume data. 

• ANSI has taken the lead in piovlding high performance interconnect standards such as 
HIPPI. FDDI. IPI. SCSI, and the forthcoming Fiber Channel Standard that enable vendors to 
design subsystems that can work together at high data rates. 

• Data striping and RAID technology have leveraged system performance an order of 
magnitude beyond the pcri'onnance of individual devices. ' 


77 



• Dlstnbuted stora^ qrstems. such as. the Andrew File Ssrstem (AF^* that originated at 
Carnegle'Mellon Umversity. offer the prospect of nationwide and worldwide storage 
systems which present a sini^ file ^stem image to the users.* 

• The IEEE has established a Storage System Standards Working Group to bring users and 
vendors together to address the creation of a standaxds-based framework to allow 
subsystems and components to fit together.* 

The national asset storage system of the near future is much more than an aggregation of high 
capacity components. New concepts are required to bring the available component 
tedinologles together Into useful, manageable j^stems. 

XTheHeed 

There are significant needs that must be addressed If the goal Is to create nationwide high 
perf<HTO' nee distributed storage systems. 


XI ^ '.- vtt aefaed Mgb Peii oi insnre Storage 

' . a'' operational storage systems at the national laboratories and supercomputer 

'' tise |»7«>nal purpose computers as storage servers. These storage servers cormect to 
or . units such as disks and tapes and serve as intermediaries in passing data to compute 
xasive nodes on their networks. At the Ltvennore Computer Center, for example, disks and 
V y are connected to an Amdahl mainframe that honors requests from users on other 
^-..tems. prtmarity Cray supercomputers, to read or write data to the storage devices.^ As the 
data rates of storage devices increase, the storage server must harulle correspondingly faster 
communications links such as HIPPl today and Fiber Chatmel Standard in the near future. 
This tends to drive the storage server computer into the supercomputer class, based on the 
required data bandwidth. 

An alternative is to attach the storage devices (or to be precise, their control units) directly to 
the network. This would eliminate the need to store and forward data through a general 
purpose mainframe or supercomputer functioning as storage server. There is precedent for 
this. At th^ National Center far Atmospheric Research, there has existed for several years a 
storage system that uses an IBM mainframe computer to set up storage trans^" -quests but 
then transfers the data directly from a storage device to a Cray supercompu . . A specially 
designed componerit of the communications system built by Network Systems Corporation 
serves to transfer the data between the storage unit and the supercomputer. Therefore, the data 
does not have to flow through the IBM mainframe. Instead, the IBM mainframe functions 
more as storage manager than storage server in the traditional sense. More recently. 
Maximum Strategy Corporation and IBM have offered disk arrays capable of HIPPI 
connectivity that have the potential to serve as network attached storage devices. 

XX Ifnltfple, Dynamic, Dlatiibiited Storage Hleiarchlea 

Current storage ^rstems including General Atoinlr's DataTree and UnfDee** and IBM's System 
Managed Storage provide a single hierarchy of storage media. In this single hierarchy. 


* AfS is a registered trademark of Transarc Cotpoiation. 

** DacaTiee and UniTroe are trademants Geneva.' Atomics 


78 



frequent^ jsed data is kept on disk, leas frequently used data is kept in an automated tape 
Ubraiy. and infrequently used data is kept in tape vaults. 

With the availability of new media such as solid state disk, disk arrays, and helical scan tape, 
there is frequent^ no single hierarchy which can be applied to all data and all media. 

Consider the problem of technology insertion. This may take the form of adding a new type uf 
data storage device into an existing ^stem. a type of tape, for escampk. Further assume that 
the existing storage devices will continue to be vsed Now. what was a simpk dlsk-to-tape 
hierarchy becomes as many as three or four hierarc.Mes: disk to dd tape, disk to new tape, old 
tape to new tape, and perhaps even new tape to old tape. Curreiu storage systems do not have 
the irtechanisms to handle this level of complexity. 

Also, consider what happens when a system grows to regional or national scale. Multlpk 
centers, each with disks and tapes, must cache recently accessed data to high performance 
media and migrate less recently accessed data to less oq>enstve media. Usually this is done 
locally, but occasionally there may be requirements to migrate data from disks (or tapes) at 
one location to taoes at another location. In a system that recognizes only one hierarchy, 
extensive human .supervision and intervention is required to handle inter-datacenter 
migration. 

Multlpk hierarchies are needed, based on such factors as location, data type. cost, and project 
afiBliaUon. Each hierarchy must be adaptabk to meet specific requirements and mtist be abk 
to change over time under the coiitrol of a system administrator. The concept of multiple 
dynamic hierarchies is d^serthed in more detail in an IBM FSC research paper* and are being 
proposed to the IEEE St ge S^'stems Standards Working Group. 

2.3. Layered Access to Stjrtge System Serv i ce s 

In distributed ^sterns, Lorage service's should be presented at several layers of abstraction. At 
the higher kvels. the ^ Tvices should provide a fuUy transparent file abstraction. At this level, 
conemrent access is synchronized, caching must detect conflicting read/write patterns by 
multiple clients, and access and modification times are tracked with timestamps. These are 
common characteristics taken for granted when using a file. 

At a lower level, where there is less need for transparency, the user may : ^ed to work with 
objects such as disk or tape blocks. These objects are provided without synchronization, 
caching, access records, etc. At an even lower level, individual devices become visible, 
requiring even more intimate knowledge of the semantics of the storage media. For exampk. 
the user must take into consideration such characteristics as write-once versus read/write and 
sequential versus random access. It is common to build important classes of applications, 
su^ as database management systems, directly on these lower level abstractions. 

At the higher levels, the fully transparent file abstraction is more convenient for most end- 
user applications. Transparencies such as location independence can make applications much 
easier to write and allow portability of both application aiul storage objects. But these extra 
components of the abstraction Impose significant performance penalties that some 
applications cannot afford. For example, a database application is intimately tied to its 
storage access pattern and. given the appropriate low level access, can cntlmize its accesses 
much more effectively than a genetic caching file system. The database application must be 
given access tc the lower I t abstraction of disk blocks, and it must be more sophisticated to 
acccs.® and * ""vie Us storage at the lower level. Other clients that are more Interested in 

high p t'l ■ e file abstraction might include a paging system, that needs the 


79 



highest level of performance with minimum o/erhea^ from networked solid-state disks, or a 
satellite downlirik. that receives data at such a hig . rale that it cannot pass through a disk 
buffer but must be written directly to the raw tape dei-^s for later processing. 

A national resource file system lust provide levels of abstraction appropriate for dhr^oe 
classes of applications. These will range from the fully abstract and highty transparent 
Interfaces for the general user to low level services for use sophisticated applications that 
are willing to obtain high performance in exchange for a more complex storage abstractkm. It 
is essential ^hat a storage system have the appropriate modularity to support multiple layers of 
storage £ ^tractions over a range of user Interfaces, file naming conventions, storage 
hierarchies, network connected devices, and storage management strategies. 

2.4. Stonge System Mungement 

Storage system management is the collection of functions concerned with control, 
coordination, monitoring, perfomiance amd utilization of the storage system. These functions 
are often interdependent, involve human decision making, and span multiple servers. 
Management functions may be implemented as stand-alone programs. Integrated with other 
storage system software, or implemented as policy. The Storage System Manager can be 
thought ^ as the collection of management processes that performs all necessary storage 
system management fimctions. 

Storage system management attempts to allocate the resources of the storage system to the i >t 
use for the overall benefit of the site. Policies for the site must be set. and manual and 
automatic procedures must be developed to implement those policies. The procedures must be 
adaptable because the requirements will change as time progresses and because the same 
software nu^' be nm at a number of different sites. 

Current storage system software packages provide tools needed to manage Individual sites. As 
gigabit nc .vorks blur the distmctian between local and remote, and as national storage 
systems are created, the issue of system management becomes of great concern. 


3. The Collaboration 

Six companies with interests in the technology foi very high performance sr '<rage systems 
Joined together with Lawrence Liveimore National Laboratory to found the collaborative 
research project. The collaboration is d ‘lined by a set of Cooperative Research and 
•Development Agreements (CRADAs) between the industrial participants and the Department of 
Energy. The National Storage Laboratory' was officially launched on May 27. 1992. when 
execuUve.3 of the six founding industrial participants and John H. Nuckolls. Director of 
I.awrence Ltvermore National Laboratory, signed an agreement with Admiral James D. 
Watkins. Secretary. U. S. Department of Energy .The collaboration Is self-funded, with 
participants providing both equipment and labor. The roles of the fouriding participants are: 

• IBM Federal Systems Company is serving ' /stems integrator and project coordinator. 
IBM Federal Systems Company is provio. .g RISC Systcm/6000 computers, and IBM 
ADSTAR Is providing a 20 gigabyte HIPPI-attached high perfomiance disk array. 

• Ampex Recording Systems Corporctlon Is contributing technology and equipment for a 
HIPPI-attached very hlgli speed, high capacity cartridge tape library system. 

• General Atomics DISCOS Division is providing UniTree storage system software .L..* will 
serve as the framework and point of departure for the software capablltties to be developed. 


80 



• Maximum Strategy Corporation is working with IBM and Ampex to configure veiy high rate 
tape and disk control units capable of network attachment. 

• Network Systems Corporation Is providing expertise in network design and is supplying 
network switches, routers, and gateways. 

• Zitel Corporation is providing a HlPPI-attached s >Ud state memory device capable of data 
rates limited only by network performance. 

• Lawrence Livermore National Laboratory, through its National Energy Research 
Supercomputer Center, is serving as the operational environment and host site for the 
coQaborative project and the source for applications. 

Four National Science Foundation laboratories have joined the NSL as members of the 
EPmcutlve Committee: 

• Cornell Theory Center 

• National Center for Supercomputirig Applications 

• Pittsburgh Supercomputing Center 

• San Diego S-;percomputer Center 

Since its founding, the coUaboradon has been joined 1^ three other contributing companies: 

• CHI Systems. Inc. is providing HIPPI adapters for several of the computing and I/O 
subsystems. 

• IGM-ATL. Inc. is providing a SCSI-attached 8mm tape library ^tem. 

• PslTech Inc. is providing a HIPPI-attached high performance frame bufier. 

It is expected that some additional growth in collaboration membership will occur, and indeed 
will be welcome. Growth in the collaboration is expected to be vertical, with new members 
offering a hardware, system software, or application interest that does not duplicate and is 
complementary to the interests the existing membership. 

4. The IEEE Stocage Systems Standaids Woridng Groiq> 

An important forum for users, developers, researchers, and suppliers to come together to work 
on mass storage system issues has been the IEEE Storage Systems Standaris Working Group. 
This standards body, organized in May 1990. will provide guidelines and standards for 
scalable, distributed, multivendor storage systems. All of the member organizations of the 
Joint research project are members of the IEEE Storage Systems Standards Working Group. 
The four basic tasks of this study, network-attached storage, multiple hierarchies, layered 
protocols, and storage system management, are areas of Interest In the Standards Working 
Group. 


tt. Ov er vi e w of the Prototype Storage System 

The colla. orative project wiu develop m operational prototype to be called the National 
Storage Laboratory. Figure ' shows the prototype system as it is envisioned toward the end of 


81 



1992. Thl? prototype will augment existing :^tems in th? National Energy Research 
Supercomputer Center (NERSC) and the Open Computing FacUl*v (OCF). both of which are 
located in the Livermore complex. The National Storage Laborato.'y will be used by LLNL 
scientists in theli woric in climatology and fusion research. 

The storage ^stem, as shown in Figure 1, is conceptually divided Into four functioi.'d groups of 
equipment. These grr ips represent ttie computational resources. Uie user areas, the network 
storage devices, and the components that provide access control and storage system 
management. Various networks are shown connecting the various components to provide 
separation of control functions and data movement functions. These functional groups of 
processors, storage devices, and netwoiks are described in the following paragraphs. 

5.L UNL Oompntatloiial Resomces 

Closely tied to the collaborative research project are the compuiatlonal engines that are both 
the producers and consumers of massive quantities of data. The prototype storage system win 
be connected to CRAY-2 and CRAY Y-MP C-90* supercomputers at NERSC. Representing 
another class of computational resource will be a cluster of IBM RISC System/6000** 
workstations that cornprlSL LLNL's Open Computing Facility. The computational complexes 
wiU be connected to the pool of network-attach^ storage resources by the Direct Data Trsuisfer 
Network and to the access and control mechanisms by a Storage Access Control Network 
functional group. Each computational ^tem has its own private storage and win continue to 
be coimected to existing sha^ storage ^tems at NERSC and OCF. that are for simplicity not 
shown in Figure 1. Also, there are existing facility netwoiks that provide access to users of the 
these computational complexes: these networks are also not shown in Figure 1. 


S^ltaerAreas 

Users of the prototype storage system wlU have their own deskside UNIX workstatlons>and win 
be connected via existing facility networks to the NFRSC and OCF computational complexes. 
These woikstatlon users wJi expect to use standard network flle services such as NFS or AFS. 
Consequently, the design of the storage system prototype includes a Secondary Storage Server 
that proviaes an NFS or AFS compatible flle system managed by an IBM RISC System/6000 
computer. The 55econdary Storage Server is in turn connected to the Network Storage 
Resources via the Direct Data Transfer Network and to the Access Control and Management 
functional group via the Storage Access Control Network. Local disk arrays on the Secondary 
Storage Server provide speed matching and caching between the high performance network 
storage resources and the workstations. Also shown in the User Area functloiiai group Is a 
Frame Buffer. 'The Frame Buffer is cormected to the Direct Data Transfer (HIPPI) network and 
is capable of displaying movle-like sequences of high resolution images from either the storage 
resources or the comp..latlonal resources. 

Corresponding to the prototypes single User Area, as shown in Figure 1. a future full 
implementation of these concepts would contain many such u.ser area functional groups, some 
locally attached and some remote. Each would have access to a local Secondary Storage Server 
and perhaps to a Frame Buffer. The Secondary Storage Server Itself may be a basic 
workstation with a few large disks, or it may be a mainfiame or a specialized storage system 
product. 


• CR.\Y Y'HP C90 and CRAY-2 are trademarks of Cray Research, Int. 

IBM RISC Sy.'lcm/6000 is a trademark of Iniemational Business Machines Corporation. 


82 




Figure 1. Conceptual Overview of the National Storage Laboratory at Lawrence 
Livermore National Laboratory 


83 








Tl\e Open Computing Facility’s RISC System/6000 Cluster which is shewn in Figure 1 as a 
single box without Internal detail, will be in many ways like the User Area, with one of the 
RISC System/6000s serving as c Secondary Storage Server for the other members of the 
cluster. The main difference between the RISC System/6000 cluster and the User Area 
functional group is in the way the systems are used. The User Area computers are Interactive 
workstations usually dedicated to an individual or a small group of people, while the 
workstations in the Open Computing Facility cluster are compute servers to which batch Jobs 
are scheduled. Thus, the User Area functional group represents a building block or "generic 
object" that can be replicated and employed in different ways. 

5.3. Netnorii Storage Resources 

One of the objectives of this research project is to gain experience with the concept of network- 
attached storage devices. Conceptually, network-attached storage devices can be shared by 
several processors while not being "owned" in a conventional serise by any of them. The 
functional group in Figure 1 labeled Network Storage Resources represents the high 
performance netwoiic-attached storage devices planned for the prototype storage system. They 
are a solid state memory device from Zitel, a disk array from IBM. and a robotic tape ^tem 
from Ampex. The Ampex tape system will use a 19 millimeter helical scan tape 'ormat called 
DD-2. All three of these storage units have intelligent controllers that are connected to both 
the Direct Data Transfer network using a HIPPI interface and to the Storage Unit Control 
network via a conventional local area network interface. The controllers for the IBM disk 
array and the Ampex tape subsystem are from Maximum Strategy Corporation, a member of 
the collaborative project. 

Commands to direct the Network Storage Resources components to send and rece've data will 
be sent from system components represented by the Access Control and Management 
functional group in Figure 1. Data will be transferred directly to the components represented 
by the LLNL Computational Resources functional group and the User Area functional group via 
the high speed Direct Data Transfer Netw'ork. Thus, there is a separation of control and data 
functions. In the prototype, this separation of control and data, which is a logical concept, will 
be enforced by a physical separation of the control and data networks. 

5.4. Access C<mtr(d and Management 

In most existing storage systems, both control and data pass through a storage system 
processor that is often a mairifraire. The NERSC, for example, uses a mainframe-based 
storage ^stem in which all data pass es through the memory of the storage system processor. 

For the storage system nrotot’^e as jbovn in Figure 1. the controlling entity will be a 
workstation class computer. This Is true even though the data rates to and from the storage 
devices are an order of magnitude faster than in existing LLNL storage ^sterns. Such a design 
is possible because the data will flow directly between the user and the device, not through the 
controlling entity, fhis is enforced in the prototype by the decision that the Access Control 
and Management components are not connected to the Direct Data ironsfer Network. 

The r.omponents of the prototype Access Control and Management System will be two IBM 
RISC System/ 6000 computers. Conceptually. thl.s complex could grow ho. l^cntally to more 
workstations, or vertically to a mainframe. Also, the access control and storage system 
management functions could be combined Into one processor. 

The Access Control .. id Management complex is where most of the software development will 
take place for the collaborative project. The ''.ccess Control functions will be based on 


84 



UnTTFee, which is a product of General Atomics DISCOS Division, a member of the 
coUaborathre project. The storage system management functions wiU be new. 

Operationally, the UnfFree component will receive requests to store or retrieve data over the 
Storage Access Control Network. The requests may orlgliiate from one of the high 
performance computational entitles shown in the LLNL Computational Resources functional 
group or from the Secondary Storage Server shown in the User Area. The modified UnfTlree 
software will translate the request into commands directing one of the devices fri the Network 
Storage Resources functional group to send data to the requestor or receive data from the 
requestor. These commands are sent over the Storage Unit Control network. 

S.5. Networks 

The principal logical networks are a Direct Data Transfer Networlc, a Storage Control Network 
(shown in Figure 1 as separate Storage Unit Control and Storage Access Control networks), and 
various facility network. 

The initial component of the Direct Data Transfer Network Is a HIPPI switch from Network 
Systems Corporation, a member of the collaborative project. HIPPI stands for High 
Performance Parallel Interface and is an ANSI standard. There wTl be a fiber extender to allow 
connection to the Open Computing Facility's RISC System/600o cluster, which is in another 
building and is well beyond HIPPI distance limitations. There will be provision for future 
attachment to remote networks through T3 and SONET, which will be studied ibllowlng the 
initial phase of tills project. 

The Storage Access Control Network and the Storage Unit Control Network are the two control 
networks that are pait of the prototype implementation. FDDI technolc^ is the design point 
for these networks. However, the networks will initially be a mixture of FDDI. Ethernet, and 
HYPERchannel* with a Network Systems Corporation router bridging the technologies. 

Existing networks at LLNL connect components within the N^RSC and the OCF arid connect 
these complexes to the user areas. These existing networks will form an integral part of the 
overall system as seen by the user. They will provide access to existing storage systems and 
will remain in place throughout this study. They will provide c<.»ni ectlvliy between the user 
workstations and the central computational resources. 

6. Applications 

A fundamental aspect of the National Storage Laboratory's philosophy is to use appropriate 
scientific applications to help set priorities and test an-i demonstrate the concepts embedded 
v/ithin the system architecture and implementation. Three application domains have been 
chosen by Lawrence Livermore National Laboratory to test and demonstrate the sjrstem's effe''t 
on scientific productivity: 

6.1. Climatic Models 

The Program for Climate Model Diagnosis and Intercomparlson (PCMDI) has as its goal to 
understand why different climate models produce different results b'*tween each other and 
with actual climate measurement data FCMDl currently needs c cce i very large files and 


* HYPF'^ channel is a trademark of Network Systems Corporation. 


85 



multinie datasets for a variety of post-processing anal}^es. The NSL architecture is expected to 
reduce data transfer times from hours to minutes. 

6.2. BSognetlc Fusion Energy Models 

The Magnetic Energy Fusion (MFE) modeling and experimentation involves e::tensive 
computer simulation modeling as well as experimental studies. It is common in their 
modelling studies to fill the supercomputer dislU with Intermediate and final results and not 
be able to proceed until this data can be transferred to shared tertlaiy storage. Ibis can cause 
delays of minutes to hours before additional runs can proceed. 

6.3. Digital Imaging 

Many scientific modeling calculations generate sequential digital images which are stored, 
retrieved, and viewed as motion pictures, known as "movie loops." In preparing these, movie 
loops, scientists need to edit and evaluate the effectiveness of various generated images. 
Currently users must wait long periods while these movies are output to slow video m« 
With the NSL testbed's high performance frame buffer and high resolulion display, together 
with the high performance data storage and retrieval capability, users will be able to store 
these movie loops directly on digital storage and play them back in real time. 

References 

1. I^tterson. D.. G. Gib-vm, and R. Katz. "A Case for Redundant Arrays of Inexpensive Disks 
(RAID)." ACM ^GMOr Chicago, pp. 109- 1 16. June 1988. 

2. Daniel Nydlck et 'An AFS-based Mass Storage System at the Pittsburgh 
Supf 'computing Center." Digest of Papers. Proc. Eleventh IEEE Symposium on Mass 
Storage Systems, pp. 117-122. October 1991. 

3. IEEE Prograr.i Action Request 124<t. using a base document from the IEEE Technical 
Conference on Mass Storage Systems and Technology. "Mass Storage Systems Reference 
Model." May 1990. 

4. Hogan. Carole, et al.. "The Livermore Distributed Storage System: Requirements and 
Overview. Digest of Papers. Proc. Tenth IEEE Symposium on Mass Storage Systems, pp. 6- 
17. M^ 1990. 

5. Merrill. John and F»1ch Tbanhardt. "Early E^xperience with Mass Storage on a UNIX-based 
Supercomputer." Dlgt.l of Papers. Proc. Tenth IEEE Symposium on Mass Storage Systems, 
pp. 117-121, M^ 1990. 

6. Buck. A. L., and Robert A. Coyne, "Dynamic Hierarchies and Optimization in Distributed 
Storage Systems." Digest of Paper*-, Proc. Eleventh IEEE Symposium on Mass Storage 
Systems, pp. 85-91. October 1991. 


86 



N93”#0453 

Tbe ViMIe Human Prqjectot the Natioiiel Libreiy of Medicine: 
Remote access end d is t i fl m t to n of emnlti-digri^te date eet 



/5 


As part of the 1986 Locg-Ra:^ Plan for the National Library of Medicine (NLM) [1]. the 
Planning Panel on Medical Education wrote that NLM should "...thorou^ly and 
systemaUcaDy Investigate the ttchnlca* requirements for and feasibility of instituting a 
biomedical images library." The panel noted the Ircreasing use of Images in clinical practice 
and biomedical research. An image library would comjrfement NLM*s existing bibliographic 
and factual database services and would ideally be available through the same computer 
networks as are these current NLM services. 

Eady in 1989. NLM's Board of Regents convened an ad hoc plamihig panel to explore poss&ie 
roles for the NLM In the area of electronic Image libraries. In Its report to the Board of Regents 
I2i. the NLM Plaimlng Panel on Electronic linage Libraries recommended that ‘NLM should 
undertake a Orst project building a distal image library of volumetric data representing a 
complete, normal adult male and female. This AHslble Human Prefect will include digitized 
phott^raphlc Images for cryosectlonlng. digital Images derived from computerized 
tomography and digital magnetic rescmance images of cadavers." 

The technologies needed to support digital high resolution image libraries. Including rapkl 
development; emd that NLM encourage invesUgator-lnltiated research into methods for 
represent !ng and Unking spatial and textual information, structural bifonnatlcs (3|. 

The first part of the Visible Human Project is the acquisition of cross-sectional CT and MRl 
digital images and cross-sectional cryosedlonal photographic images of a representative male 
and female cadaver at an average of one millimeter intervals. The corresponding cross- 
sections in each of the three modaU^ies are to be registerable with one another. A two year 
contract for acquisition of this data was awarded In Auguoc 1991 to the University of Colorado 
at Denver. Victor M. Spitzer. Ph.D. and David G. Whitlock. M.D.. Ph.D. are the principal 
lin^tlgators. 

Under the terms of the data collection contract, the cryosectlonal data wlU be returned to NLM 
as photographs of the cross-sections. But the goal of ‘he Visible Human Project is a digital 
image Ubrary. I'owards the end of the summer of 1 992. a Request for Proposals (RFP) wlU be 
issued for the digitization of the cryosectlonal photograplis. That contract is to be awarded in 
the early spring of 1993. When the Visible Human Project was conceived In 1989 It was 
preceded that the best resolution that could be expected fn»n this digitization process would be 

2.000 pixels Ity 2.000 pixels In 24 bit color. It is now diought that a resolution d 3.000 pixels by 

3.000 pixels or even higher should be required. 

Assuming a resolution of 512 pi;:eis by 512 pixels b> 12 bits of grty tone lor the CT and MRI 
data, and 2.000 pixels by 2.00C pixels by 24 bit color for the cryosectlonal data, the image part 
of the Visible Human data set will comprise approximately 50 gigabytes of uncompressed data. 
This would correspond to more than 75 CD-ROMs. Increasing the resolution of the 
ciycsectional images to 3.000 pixels to 3.000 pixels wouk* .ocrease the size of the image library 
to about 1 10 gigabytes. 


87 



» 


A distribute can be used to find pictures, and pictures can be used as an index into relevant text 
arc being exper i me n ted with. Basic reaeardi Is r**ded In the d eaul p tl oo and reprea mt a tl on of 
structures, and the connection of structural-anatomical to funcUonal-pbyslologlcal 
knoadedge. This !s the huger, long tenn goal of the Visible Human Project: to produce a Sjystem 
of knowledge structures which will transparently link visual knowledge forms to symbolic 
knoadedge fonnats. so that the print Ubraiy and the Image hbraiy become one -mlfied resource 
for medical hiformatlon. 


|1| National Ubrary of Medldne (U.S.) Board of Regents. Long Range Plan: Report 
of the Board of Regents. U.S. Department of Health and Human Services. PuUlc 
Health Service. Natkuial Institutes of Health. 1987. 

(2] National Library of Medlcliie (U.S.) Board of Regents. Electronic Imaging: 
Report of the Board of Regents. U.S. Department of Health and Human Services. 
PuhUc Health Service. National Institutes of Ifealth. 1990. NIH Publication 90- 
2197. 

PI Brlnkl^. J.F. "Structural Informatics and Its applications in medicine and 
biology." Academic Medicine. 1991. 66:589-591. 


88 



N93-S0454 


DATA IfANAGBMBNT Df NQAA 


William M. CaUcott 
ROAA/NKIDIS 
IB4 Boom 3316 06D/5 

miiHurf.Mn 20333 


/5 f 





ABBTBilCT 

The NQAA. archives contain 150 terabytes of data in digital form, most of which are the high 
volume GOES satellite br ge data. There are 630 data bases containing 2,350 environmental 
variables. There arc 375 million film records and 90 million paper records in addition to the 
Af fftstt data base. The current data accession rate is 10% per year and the number of users are 
increasing at a 10% aimual rate. NQAA publishes 5.000 publications and distributes over one 
mflUan copies to almost 41.000 pacing customers. Each year, over six million records are ki^ 
entered {rom manuscript documents and about 13.000 computer tapes and 40.000 satellite 
hardo^ images are entered into the ardilve. Earfy digital data were stored on punched cards 
and open reel emnputer tapes. In the late seventies, an advanced helical scan technology 
(AMPEX TBM) was implemented. Now. punched cards have disappeared, the TBM system was 
abandoned, most data stored on open rwl tapes have been migrated to 3480 cartridges, many 
q;>eclab29ed data sets have been distributed on CD ROMs, q>eclal archives are being copied to 12 
inch optical WORM disks. 51/4 Inch magneto-optical disks have been employed for 
workstation applications, and 8 mm EXABYTE tapes are plaimed for nuyor data collection 
programs. The rapid expansion of new data sets, some of which constitute large volumes of 
data, coupled with the need for vastly Improved access mechanisms, portability, and Improved 
longevity are factors which will influence NOAA's future systems approaches for data 
management. 


89 




Data is tbe product of inyeatigatlon. In science and technotogy. data is afl that is left after the 
budget is sp^ and the opportunity is gone. Years later, some data are "dusted off* and made a 
part of research resulting with substantially more impact than the original use. For example, 
the "oeone hole" gleaned fiom old NIMBUS satellite data certainly had an impact far greater 
than the original Investigatora must ever have dreamed. 

Abo. experiments and their in "Mdual data collections eventually become part of a much 
larger investigation. In certain areas of euvironmenta} research, data are combined not ordy 
from a variety of contemporary experiments and routine measurements, but also from 
measurements made over extended time periods. 

A fact of life in environmental research b that the obse rv ed periods of measuraUe change for 
Important environmental parameters are greater than the profraalonal Uvea of the scientists 
invest^atmg those parameters. It b onty through careful retention of data records that data 
win be useful for future appllcattons. Perhaps a lifetime contribution of a scientist in some 
critical e nv iro nment a l areas m4^t be hb or her data carefully and skillfully collected over a 
career to support high quality scientific observations and conclusions many years in the 
future. 

Thb paper addresses the functions of data management, those activities needed to hety ensure 
that the in v es tments made in data have the maximum chance for beating a return. Pleaaenote 
that data management has boundaries. It b not. for example, to be confused with the sdence or 
technology that either produces or uses data. Vfithin the bounds of data management are those 
activities which involve the planning for. management and preservation of. mid provision for 
access to data held in an archive. 


2.0. ENVntfMIMBNTALDATAIfANAQEMBirriNNOAA 


2.1. C URRE NT INTR AS TR U CT U RE 

NOAA b in the midst of acquiring information to develop requirements specifications for the 
modernization of its data nianagement infrastructure. Hi^Ulghted in this evaluation b the 
fact that NOAA must make mission adjustments as well as invest in improved data handling 
resources to resurrect its data ^tem to preserve data for eaqianded research. 

NOAA manages six Wodd Data Centers, three National Data Centers, and over thirty centers of 
data. The National Data Centers constitute the formal N .AA centers and host the six World 
Centers of Data. The National Centers are distdbuted as the climate, ocean, and geophysical 
science discipline centers. Ihe NOAA Centers of Data are made up of the various laboratories, 
science, and major processing centers, all which use and generate data in aconnplishlng their 
mission. 

The NOAA Earth Data Directory system has over 1.100 directory entries which continue to 
increase as new data sets are added. The directory infoi .nation b In tlxe Directory Interchange 
Format (DIF) for international interoperability. 

The NOAA archives contain 150 terabytes of data in digital form, most of which are the high 
volume GOES satellite image data. There are 630 data bases containing .2.350 envirorunental 
varbbles. There are 375 i^lion film records and 90 million paper records in addition to the 
digital data base. The current data accession rate b 10% per year and the number of users are 
increasing at a 10% armual rate. NOAA publishes 5.000 publications and distributes over one 
million copies to 41.000 paying customers. Each year, over six million records are key entered 
from manuscript documents and about 13.000 computer tapes and 40.000 satellite hardcopy 
images are entered into the archive. 




91 



3.1.1. STOKMQBiiSDIADfNQAA 

^yplyliig todies conservative approach in determining a low risk media for storing satellite 
data may not suffice for tomonWs high rate large volume data sources. The polar and 
geostationaty satellite data are handled dlfferaitly because of the intended use of the data. *nie 
polar satellite data, is rigidly calibrated for quantitative processing, and GOES data is used 
primarlty in a qualitative mode for near-real-time qperatkmal support. 

The 3480 magnetic tape technology offers a cost effective low risk platform for storing polar 
satellite data. The antlcqrated data volumes from the polar satdlltes throughout the 1990s. 
with future systems like NQAA KLM. will not change sufficient^ to force a ded^n to switch to 
a higher risk, large capacity and hlgb data rate media. 

In 1978. a crmtract was made with the Umversity of Wisconsin to archive the GOES data on 19 
mm SONY l^uatlc Beta tapes. The data are recorded In its original qracecraft retransmitted 
time-stretched mode. This mode of archive will ccmtlnue thnn^ghout the life c£ GOES-H and 
likely with the METEOSAT ^p-fiOer data. The decision of how to archive the GOES-I/M data 
beglcmlng in 1994 has not been made. 

Those NASA EOS and European polar satellite platforms, which included Instruments NOAA 
iotended to use in its operations, have been dela^xi btyond the year 2000. Like the NQAA KLM 
program, the data rates erqiected from other non-NOAA satellites win not be c a magnitude to 
force a change in the imme^te future to a higher rate and capacity oKdia. 

The Data centers receive data on a variety of media forms. Many of the low volume data sets 
can easity fit in floppy disks. 5 1/4 inch magneto optical (MCg diate. CD ROM. 4 mm DAT. or 8 
mm EXABYTE tapes. At this level, the risk to the data can be reduced by savmg multiple 
copies of the data. However, not aU of these media flams are suitable for long term archh^. 
The continuing coUection of conventional observation data sets likewise wiU not force a 
chan^ to a higher capacity media. The exception to this is the National Weather Service 
NEXRAD dqppkr radar program. This system when fully depleted in 1997 has the potential d[ 
accumulating about 1(^ terabytes a year from its 159 operational stations. Currently, the 
NEXRAD system inte^tor is proposing a PC based EXABYTE media Ah’ the high volume level 
n source data and magneto optical disks for the lower volume level in product data. There may 
be a requirement to migrate from the low-end technology employed at the station level to some 
more robust higher rate and density recording system for the purpose of improving future 
access and retrieval. 

One of three events will influence NQAA to select new media alternatives. One. a proven and 
reliable cost effective media will offer an opportunity to mitigate obsolescence: or two. the 
decision to change to an alternative storage media for the GOES-I program will leverage a 
change; or. three. NOAA will want a higher degree of compatibility for exchange of and access 
to data which will be leveraged by other programs supporting global change. i.e.. EOT>. 


2.2. ANTICIPATED GROWTH 

The digital data volumes collected from the NOAA satelllie programs will grow from the 
current rate of about three terabytes per year to about seven terabytes per year from 1994 
through 2002. After then, the NOAA satellite data volumes will almost triple. The non- 
satellite data accession rates will increase from the current amount of about 2C0 gigabytes 
annually to upwards of 100 terabytes annually when the National Weathei c::<ervice NEXRAD 
floppier radar program is fully deployed In 1997. The NOAA digital data h 'Idjigs will grow 
from 150 to 1.600 terabytes in the next 15 years. If the data used by NOA/. from non-NOAA 
sources were added, the fifteen year total would be doubled. 


92 



2JS. coNNEcnvnr 


A wide area networiL tWAN) has been establlshec linking the Sultland and Camp Spring 
conqmdng centers. This netwmk is an ethemet Unk usli^ the TCP/IP protocol. The WAN has 
recently been extended to the National Climate Data Center (NCDC) In Asheville. North 
Carolma. and to the National Oceanographic Data Center m downtown Washington. D.C. 
iiahig 1 Mbps carriers. The Sultland WAN has been extended to Include the Satellite Data 
Services Division (SD^l NCDC and will soon connect the S DSD office in another Camp 
Sprln($iS location, '^er NESDIS access to the WAN is through INTERNET. The physical links 
between the World Weather Building fWWB) in Camp Springs. Maryland and Federal Building 4 
(PB4) in Sultland. Maryland, use q>are capacity on an analog microwave imk between the WWB 
and FB4. This microwave a^em win be upgraded to a dlg£al system later in 1992. The WAN 
interconnections are accmnpllshed with an array of router systems connecting thin wire and 
optical caUe. An mtemal data transfer rate of 10 Mbits is available with the external data 
tnmsfer limited to the carrier service procured. 

The WAN permits an expanded capability to exchange data and Infonnatlon between NCAA 
centers. Current^, about 30 Gb^es per week are exchan^d over the network. All of the 
computer to computer exchange between the FB4 NESDIS and NMC compiiter cmters use the 
WAN. Remote local workstations in NESDIS with appropriate logon permission have 
access to the NESDIS and NMC computer systems. 

The INTERNET allows for eaqianded use of the WAN systems to other ronote NCAA sites and to 
researchers v^o need access to NCAA data and information. NCAA under the guidance of a 
NQAA-wtele Network Advisoiy Board is pursuing resources to establish and manage a NCAA 
INTERNET node. these facilities. It Is expected that access to NQAA data and information 
win increase dramatical^.' expanding almost e^qwnentlally as global change research activity 
increases. 

A predominate cultural tendency is direct contact. The sophisticated scientist who is a routine 
user of the data knows vdieie. what and how about the data and doesn’t need search assistance 
and merely phones in to order direct from the inventory. The begliuiing scientist, who is not 
sure what the data is or how to get it and what is required to read it. will either phone and 
discuss these lten» with a dlsclpime scientist or a data systems person, or sometimes will visit 
to personalty browse throu^ the data. Mail requests usualty come from individuals looking 
for a specific piece of infonnatlon and generally want only an answer and not the data. 
Planners, lawyers, builders, and general public make up this group. At the National Climate 
Center. 30% mall their requests. 30% fax. 33% phone the center, ateut 4% visit and less than 
3% use electronic mail services. All requests are currenlty serviced with manual intervention. 
The low number of electronic contacts are for several reasov.<~ tie ts past culture. Another is 
that many users do not have the means to dial in for data and information, and most who do. 
don't know how. The most widely used electronic contact is through f&csimile. This is because 
rum-computer types can easily use the fax. though fax is probably iu> more widely dispersed 
than personal computers (PCs) in nu)dem offices. 


2.4. ANTICIPATED USAGE CHANGES 

The rapid expansion of computer technologies and the rapid increase in media capacities 
available for the desk top ut.c'^ has evolved a community of computer literate data brokers 
looking for data. NQAA anticipates a growing need from a new user culture who will want to 
perform all of their information exchange furKtion electronically. 

As the workstation perfonnance continues to reach new heights and affordable media 
technology are developed to attach billions of bytes to the worlutation. this new generation 
of users interested in envircinmental data will come to NQAA expecting sophisticated access 


93 



and distribution support. In this enviionnient. the current manual access and search methods 
will not be sullkient to service the unsatiable need for data. 


When the global sdence community became aware of potential global environment problems 
in the last decade, a ground sweU of public and eventually poUtical interest elected the 
scientific interest in determining and predlctlrw change if. in fact, change was believed to be 
taking place. In the late 1980s. govemoKnt funds were authorized to increase support for 
environmental observation and research. This led to heightened Interest in the decades ot 
environmental data accumulated in NQAA's archive centers. As a result, interest in NQAA 
data is eiqrected to Increase at a much greater rate than the current 10% growth. 

The present NQAA Earth Data Directory is a hi^ level information directory itemizing NQAA 
ardhived data sets. This directory is Intended to initiate the unfamiliar with NQAA data and 
information. It is operated as a level-1 catalog where reference systems are not electronically 
linked for passing Uie user to other reference directories or inventories. Most experienced 
users of NQAA data alrearty knmv who to contact and how to search the NQAA Inventory dteita. 
and usualty order direct. Ninety per cent oi the access to satelltte data is made within ^ days 
after the data have been collected. 

New computing capacities and communications bandwidth available to users of 
environmental data will demand a greater d^ree of automatic search and request capabilities 
fixxn the NQAA centers. NQAA plans to m^ernize its data servicec to meet the expanding 
requirements for more data and increased automation of services by first, developing level-2 
and eventually level-3 catalog interoperability services, and. second, implementing a 
hierarchical storage ^tem which would place a substantial amount of data on-line. The 
ultimate goal would be for NQAA to develop catalog and delivery systems which would allow a 
user to build a customized subset of data from the variety of data and information held 
NQAA which could be automatically assembled and dispatched to the user without delay. In 
this perfect ^stem. the user could search and retrieve in a seamless environment. 


3.0. PLANNING FOR COLUSCnON OF MEANINGFUL DATA 

The meclianisms of nature and their interactions are often subtle and not well understood. As 
a consequence, the creation and collection of experimental data must be accomplished with 
care or the subtleties could be lost. This is the experimenters' province and they, alone, must 
bear the responsibility for success at this early stage. The penalty for failure at this stage is 
lost opportuiJty not to mention a waste of resources and. worse, possibly misleading results 
for future researchers. 

Critical factors In data management planning may iiurlude the following: 

1 - adequacy of measurement, calibration and space/time reference accuracy 

2 - adherence to standards 

3 - documentation of the data and its use history 

4 - adequacy and preservation of recording media 
6 - access and distribution 

This list might be considered by some to be a gratuitous list of factors that those in the business 
of processing data would certainly be concerned about without the need to be reminded. 
Ex^rience, though, tells us otherwise. The NQAA COES archive exemplifies how a data set 
should not be managed: though admittedly, the GOES system was never intended to be used as a 
precise measuring tool, but as a tactical tool for supporting NQAA's forecast and warning 
mission. With this system, the satellite Instrument responses are irreversibly altered to 


94 



normalize the aberrations resulting from nonuniform detector responses of the eight visible 
channels, there was no on-board Infrared calibration capability, the early geo-referendng 
system depended on a 30 hour forecast of the spacecraft attitude hi an environment where orbit 
adjustments were periodically applied for station-keeping, the data were archived in the 
spacecraft downlink format on unique 19 mm commercial video recorders, and little 
documentation exists. Therefrar. the twelve years of data with a data volume of more than 100 
trillion bytes only has marginal use for globed change science eiqierbnents. Even then, it will 
cost millions dollars to marry of the known data set problems before the data set can be 
rendered useful even as a relative ^ta set to the science community. 


4.0. MANACaNG AND PRESERVniQ DATA 


4.1. DOCOllENTAnON AND STANDARDS 

The national centers will require the necessary documentation to index the data to their 
^tem and establish an information record for future applications. There are compelling 
reasons for standards to dccurr^nt and describe the data record. A most critical reason Is the 
portability of data for the convenience (d* access and input. Standards In these cases are rules 
establishing fiilfy described links between the data and users. Another reason is to doctunent 
the recturd for future access, with the value of dactimentation geometrically increasing as the 
time scale extends. Without detafled documentation, the data utili^ will dlrnlnlsh as It ages. 

Standards are developed for the convenience of the user and protection of the data. Standards 
are necessary for ensuring adequate portability, particularly for future applications, as the 
^tems used to record the data are replaced by new and scxnetlmes different technologies. The 
risks associated with fitting to unique storage or recording structures Is magnifi^ by the 
fiequent changes in technology. An example of the benefit of using data standards is the ANSI 
header fite used to label computer tape data. It provides minimal documentation in a uniform 
system generated file for portability across ^tems. and it reduces the chance of Job failures or 
delays by verifying the input, thus improving system efficiency. 

Some standards are set in a dc ;«.cto manner where agencies who generate large numbers of 
high volume data sets inlluence users of their data to alter their ^tems for the convenience of 
using the data. In the communications world, the highly succes^ul TCP/IP protocol is a case 
where a de facto standard took hold. This interface protocol was developed in the absence of 
network standards. Because the network was large, manufacturers and ^tem integrators 
conformed to the protocol m order to market their systems in a competitive market place. Now 
it Is probabfy the most widely used common communications standard. 

Internationally, the CEOS Data Interchange Format (DIF) was endorsed by the members of 
Committee for Earth Observing Satellites (CEX3S). This Is a standard for structuring catalog 
system directory level description data to allow international exchange of high-le^'el metadata 
Information to facilitate access to data held in international centers. 

Because of the magnitude of future data 5tystems and the recognition that high-quality. !ong- 
tenn data sets are es.sential for research, embedded doc'imenlatlon and portability standards 
are deemed to be absolutely essential. Embedded documentation ensures that loi^-term data 
can be identified in future applications. Portability standards such as OSl protocol standards 
ensure that data can be interchanged in future appllcatiors. 

However, over-documentation can have deleterious effects for users of data. For example, 
packet technologies, which are widely applied for telecommunicating data, present a data 
management burden when very large data sets are used, as breaking the data into many small 
packets creates overhead which Impedes efficient handling of the data. An example of this is 
the GRIB data communications standard for transmuting point data values. This 
transmission standard grew out of the teletype era when line noise and bandwidth limitations 


95 



required a hi^ level of data "bracketing to reduce the loss of data through garbling and drop 
outa. The Globid Telecommunications ^tem la totally dependent on this standard. Datasets 
archived in this format are highly inefiicient for modem computer processing approaches. 


4A MEDIA 

The media used to store data for future reference and. in some cases permanently, is a critical 
factor to consider In data management. Today, the advances made in the development of 
sophisticated media capabilities coupled with an ever increasing variety of media technologies 
have expanded the options to consider in managing data. In the past, the choices were 
manuscript, paper computer output, or film, all which had known life cycle end risk factor. 
Now a variety magnetic and optical tape and disks are among the current computer-form 
choices, and perhaps in the future, crystal and molecular storage tystems will eaqiand this list. 
The never ending development of media technologies has created a dilemma for the data 
manager who must now determine which media best suits the user requirements for both the 
present and future. Issues of cross system compatibility, future tystem portability, and 
oqiected media life have equal weight in the decision process. 

For the data production manager, the media which can keep pace with the data fiow and. at the 
same time, is the roost cost efiecth'e, is the media of choice. For the scientist, it is the media 
which holds the most data on-line for convenient and efficient access. For the archivist, it is 
the media which contains the most data in a gt.’en unit volume, and which has the longest 
expected life when stored in a passive state. Often these requirements are in conflict with each 
other. Then, what is the solution? For some data sets, particularly small volume data sets, 
employing multiple types of media would be acceptable. However, other strat^les are 
necessary for lar^e volume data sets where redundant storage forms would either be 
unmanageable or too costly. These confllctlTig needs force the archive manager to plan to 
migrate l^tween media types. 

Emerging media technolc^es present an elevated risk to data, as these technologies otlen do 
not have a sufficient demonstrated performance record to fully predict longevity 
characteristics. Also, many technologies emerge without standards creating havoc with 
portability and system interface requirements. An example of this is the implementation of 
optical dtsk systems cf which there are many form factors. After over five years of optical 
^sterns availability, only the CD-ROM has a standard (ISO 9660). and this process was 
leveraged primarily by the consumer audio CD market. The magneto optical (MO) recording 
system implementers have never developed standards and the use of ..lese systems devates the 
risk to the data because of the market place volatility and because of the virtual absence of 
recording standards. The higher capacity 12 and 14 inch write one. read only (WORM) optical 
media are relatively high cost systems, and like the MO recording systems, have no established 
standards, which inhibit portability and elevate the risk to the data in an archive 
environment. 

Another concern is the mechanical performance characteristics of media. Today, prouabty 
the most stable archive media available is film. This, however, is not suitable for rapid access 
and deployment of data used in computer applications. Th'-- development of magnetic media 
continues to advance as compared to the optical media forms which so far continue to emerge. 
However, magnetic media technologies present some risk factors for data longevity. The 
earlier open reel tape technologies were long considered stable for archive purposes, but not 
without mechanical management to prevent material deformation and magnetic read through. 
The introduction of the IBM 3480 cartridge tape eliminated many of the earlier tape 
management problems by improving the system mechanical handling properties and error 
correction code applications. However, standard computer tapes are capacity limited and are 
not suitable for some future ultra-high volume data ot^rvation systems. 

The helical scan recording technologies adopted from the video recording industry offer the 
necessary recording rate and capacity to meet high rate and volume requirements. These 


96 



m 

media have physical characteristics which introduce new risk factors to data intended for 
iong-tenn retention. For example, the magnetic recording surface is an unoxldized pigment 
which may be susceptible to corrosive pollutants. The substrate used as the backing is 
stretched to remove its elasticity (tensilized) for rigid tape to recording head control and to 
Increase the run length of tape in a small cassette. The tensilized polyester substrate is 
predicted to have a tendency to relax or shrink over time afiectlng the head tracking servo and 
increasing the risk to successful data recovery. And, these conditions are likely to be 
accelerated by stress induced with excessive heat and humidity. 

Until the known risk factors with magnetic media are mitigated, the data manager must 
develop tools to carefully monitor the media recording validity, track tlie media performance 
over time, and exercise rigid environmental storage controls to minimize these known stress 
factors. However, the most Important element in managing data in an archive environment is 
to include resources to allow for periodic data migration to avoid the risk of losing data due to 
mechanical failures or system obsolescence, which alone is a risk common to all systems. 

However, data longevity is not the only factor to consider. Data access is an equally important 
factor. 


5.0. ntOVmiNG ACCESS TO ENVntOHMraiTAL DATA 

The quality of the data and its documentation and the preservation of these data in a lasting 
state have little value if access to the data and information is hindered by loss of logistics 
control or excessive recovery costs. A sophistical indexing scheme is necessary for locating 
data and iniormatlon in the future. As the collection of data continues, redoubling the 
problem of storing and nnding the data, the level of index sophistication must necessarily 
increase to facilitate an increasing degree of automation in the search process For example, to 
research a publication today in most libraries, one must know either the author or title. Some 
levels of "key word" searching are being made available but or. a li:.clted basts. The largest 
public library in this country is the New York library. It is estimated that its entire collection 
if digitized would represent a thousand trillion bits, or one petablt. The NASA EOS program hi 
Just 15 years will produce over 20 petablts of data. Without a very high degree of automatic 
indexing across many scieiKe disciplines, the utility of these diverse observation and research 
data sets will be greatly diminished. 

However, once Indexed, the researcher is faced with the problem of porting the data from an 
archive system for processing. There are many problems to address here. How will the data be 
packaged? If the data set is very large, and most will be after a long period of time, how can 
subsets of the (L.ta and combinations of different daca be repacltaged in a portable, useful, and 
affordable tool? 

Our daia problem today is akin to the comparison of the early country store and the modem 
supermarket [11. The country store survived in an era when the variety of products and 
packaging was simple. There, the customer would order across a counter and the access service 
would DC provided by a clerk. The clerk was the data base directory and inventory and the 
user's lexicon was simplified by the limited variety of available products. Imagine trying to 
shop in a modem supermarket using that type of access envirorunent. Most of today's data 
bases put the user iu that pcsltion. as there are only rare instances of browse data and the 
available lexicon matches are limited to high order dlrectoiy seivices. From a researcher's 
point of view, modem data bases should be like the super market, where one can browse 
through the aisles, view the variety of products for selection, as well as read *he labels 
containing the technical specifications of the products. 

In the world of data and Information, an information di''?ctory alone does not provide 
sufficient ir/nrmatlon for a user to find and ultimately use data. The next logical level for a 
user to sea.ch is the Inventory which describes the physical arrangement of data. However, an 
ability to :ook at or vtsuallze ihe data would Increase the level of understanding of the data. 


97 



Thus, a browse data set becomes increasingly important to the user. In fact, the level of 
Importance of browse utility is llkety to Increase as the data ages. 

If this hierarchy of searching is complicated enough for a single data set. what about <^er 
related data sets held possibly In the same or other archive centers? This opens up anouicr 
aspect of browsing through the shelves of the supen-tarket. In the super muulcet. all of the 
similar products are on adjacent shelves, 'fhe catsup, nustard. mayonnaise, etc., arc usually 
together, the baking stuff is usualfy grouped, etc. Another feature of food marketing Is that the 
.•macks, such as chips, are almost everywhere. The same sl.:^ :ld hold for data. The varfety o4T 
data sets within a discipline should be linked at the Inventor, !r./el and the ancillary In-sltu 
data. l.e.. chips of the data world, should be more wldety cross linked across data set gioups. 
Searching for data should be as simple as searching through the super market shelves where 
everything in the store can be felt, read and compared by the user. For the users who know 
exactfy what they want, th^r c'm simply proceed to the appropriate aisle and select the product 
and leave. 

But what happerrs when today’s super market becomes a megamarket as the data world Is fast 
experiencing? Could one afford to stroll the aisles browsing through unimaginable varieties of 
pr^ucts? What If the products were packaged in pallet sized ces? Cnukl one lug the package 
home and shelve it there? That is what the data world is beginning to experience. In order to 
survive this environment, one will need to have a better sense of where to look and must be able 
to apportion manageable amounts of data for local consumption. In the data management 
world, knowledge based help tools will be necessary to assist Uie user, and the user must have 
the capability to erdract pieces of data for local consumption. 

Once we find the data, how do we get it home to use? In the super market scenario. If you are 
walking. 3 TOU should be careful to purchase o: Jy what you can carry, or. if you are driving, then 
you are limited by the capacity of your vehicle. In either case, you should know your 
limitations before you buy. In the world of data, aside from the cost, your limitations are the 
bandwidth you can afford both in terms of electronic transfer bandwidth and the media 
compatibility. This is where media portability, through both mechanical and applied 
standards, becomes an important issue. Another important aspect not to be overlooked is the 
eflk^iency of the media for processing. 


6.0. CONCLUSICN 

The media used for recording and storing data is onty one aspect of data management. Finding 
and acquiring data is the other. However, the lif. and vitality of the data are dependent on the 
media capsule, and the lack of care ..ad handling here determines the ultimate future of the 
data. I'lus. all the efforts to develop robu: * data management search and select tool> can onty 
be secoi.daiy to the media technologies use - to 'x>nvey the data in storage and use. 


REFERENCES 

(1| R Jerme. NCAR A National Grocery Store System. April 3. 1991 


98 



N93-B0455 

INTERIM REPORT ON LANDSAT NATIONAL ARCHIVE ACTIVITIES 


JotmE. Boyd 


U.S. Geolo0e«l Surrej 


EROS DaU Center. Slouz Fans. 8D 5719S 

/5 

ABSTRACT 




The Department of the Interior (DOl) has the responsibility to preserve and to distribute most 
Landsat Thematic Mapper (TM) and Multispectral Scanner (MS^ data that have been acquired 
by the five Landsat satellites operational since July 1972. Data that are sUH covered by 
exclusive marketing rights, which were granted b>- ‘he LI.S. Government to the c mnm e r c l al 
Landsat operator, cannot be distributed by the DOi. As the designate national archive for 
Landsat data, the U.S. Geological Survey’s EROS Data Center (EDC) has initiated two new 
programs to protect and make available any of the 625.000 MSS scenes currentty ardiived and 
the 200.000 IM scenes to be archived at EDC by 1995. 

A spec' Uy configured system has begun converting Landsat MSS data from obsolete high- 
density tapes (ITOTS) to more dense digital cassette tapes. After transc ptlon. continuous 
satellite s\/aths are (1) divided into standard scenes defined by a world reference tystem. (3) 
geographicalty located by latitude and longitude, and (3) assessed for overall quality. Digital 
browse images are created by subsampling the full-resolution swaths. Conversion Of the TM 
HDTs will begin in the fourth quarter of 1992 and will be conducted concurrently with MSS 
conversion. Although the TM archive is three times larger than the entire MSS archive, 
conversion of data from both sensor systems and consolidation of the entire Landsat archive 
at EIDC wifi be completed by the end of 1994. 

Some MSS HDTs have deteriorated, primarily as a result of hydrolysis of the pigment binder. 
Based on a small sample of the 1 1 terabytes of post-19'/8 MSS data and the 41 terabytes of TM 
data to be converted, it appears that to date, less than 2 percent of the data have been lost The 
data loss occurs within small portions of some scenes: few scenes are lost entirely. 
Approximately 10,000 pre-1979 MSS HDTs have deteriorated to such an extent, as a result of 
hydrolysis, that the data cannot be recovered without special treatment of the tapes. An 
i^ependent consulting division of a major tape manufacturer has analyzed affected tapes and 
is confident that restorative procedures can be applied to the HDTs to permit one pass to 
reproduce the data on another recording media. 

A system to distribute minimally processed Landsat data will be procured in 1992 and will be 
operational by mid- 1994. Any TM or MSS data in the national archive that are not restricted 
by exclusive marketing rights will be reproduced directly from the archive media onto user- 
specified computer-compatible media . TM data will be produced either at a raw level 
(radiometrlcalty and geometrically uncorrected) or at an intermediate level (radlomctrlcally 
corrected and <?eomeL*ically Indexed). MSS data wll' be produced to an intermediate level or to 
a fully corrected level (radlometrically corrected and geometrically transformed to an Oblique 
Mercator projection). The system will be capable of providing ordered scenes within 48 hours 
of receipt of order. 


99 




N93-80456 


■ 5 >- 52 ' 

- 

/i ■' ^ / 

!/ 

MR-CDF: Managing Multi- Resolution Scientific Data / 

Kenneth Salem 

Computer Science Department, University of Maryland 
College Park, MD 20742 
and 

CEiSDIS, NASA Goddard Spa^^ Flight Center, Code 930.5 
Greenbelt, MD 20771 

Abstract 

MK-CDF is a system for managing multi-resolution scientific data sets. It is an extension 
of the popular CDF (Common Data Format) system. MR-CDF provides a simple functional 
interface to client programs for storage and retrieval of data. Data is stored so that low- 
resolution versions of the data can be provided quickly. Higher resolutions are also available^ 
but not as quickly. By managing data with MR-CDF, an application can be relieved of 
the low-level details of data management, and can easily trade data resolution for improved 
access time. 


1 Introduction 


Scientific data management librao'ies, such as NASA's publicly'distributed Common Data for- 
mat (CDF){Tr90,TrGo90], implement simple data models that are tailored for scientific data. 
Data managed using these libraries is machine-independent, port..ble, ar.d self-describing. Ac- 
cess to the data is performed through a set of interface funcuont that shield the details of 
storage and retrieval from application programs. The libraries provide a common interface upon 
which portable, application-specific tools (e.g., classifiers, analysis packages, visualization and 
browsing tools) can be implemented. 

Because scientific data sets are often voluminous, it is desirable to make them available 
at several different resol uti ns. Preliminary examination, or browsing^ of large amounts of 
data often can be performed efficiently using low- resolution data. Tentative anaJys 2 s can be 
performed using intermediate-resolutions, and the final analysis cap be performed using the 
'^nta's full resolution. Lower resolution data is desirable during the preliminary stages because 
it allows largf^ volume? of data to be considered in a reasonable amount of time. 



101 






Lo«4lQ*oi«ioB t>mm Dtft M-RbkMqb D«i 

(OngiMiDtta) 

Figure 1: The Mulii-Stage Representation Used by MR-CDF 

Thif pe.per describes a scientific data management library called MR-CDF (Multi-Resolution 
Cc>mmor\ Data Format) which permits multiple-resolution data sets to be manipulated through 
a simph , functional interface. Application programs that use MR-CDF see a scientific data 
model ic'^ntkal to that supported by CDF. When retrieving data, however, they are able to 
specify a desired resolution ’^vel. Applications requiring full-resolution data can obtain it, while 
those that c* n use lower resolutions are be able to do so simply and quickly. 

MR-CDF uses a multi-stage representation for stored multi-resolution data. This is illus- 
tratKl in Figi re i. A data set that is to be made available at R different resolutions is decom- 
posed into R itageSy each of which is stored. The decomposition is such that, by retrieving and 
comtining i stages MR-CDF can produce the data at one resolution, and by retrieving t 1 
stages it can produce the u .la at a higher resolution. By retrieving and combining all of the 
stages, MR-CDF ce p. oduce an exact reconstruction of the data at its original, full resolution. 
The proceso of i' rieving and combining stages is completely transparent to the application that 
requested uata, except that lower resolution requests can be satisfied more quickly than 
others. 

" here are several difRc’-'ties involved in providing an abstract interface for multi-resolution 
data. The first u the ''ide variety of techniques that can be used to decompose data into stages 


102 



for storage. As we shall describe shortly, the decomposition process is essentially an iterative 
lossy compression of the data. A wide variety of compression techniques are available, and 
different techniques are well-suited to different types of data. Examples include various re^on 
averaging algorithms, vector quantization, and quadtree-like methods [TiMa90, Ti89]. The 
procedure for properly recombining the stages when data is retrieved depends on which of the 
many possible compression techniques was ori^nally used to decompose the data. Tying MR- 
CDF to any particular compression technique would severely limit its applicability. Instead, MR- 
CDF must b^ flexible enough to accommodate a wide variety of application-: ’^ecified techniques. 

A second JiflSculty arises when applications make use of MR-CDF*s simple selection facility 
to retrieve only a portion of the stored data set. Ideally, MR-CDF would perform the selection 
he/ofe recombining the stages to minimize the volume of data to be retrieved and recombined. 
This may or may not be possible, depending on which technique was used to produce the 
stored stages. Some compression techniques are better suited than others for producing easily- 
manageable data, at least within the framework of MR-CDF. Although such problems need not 
limit the functionality of MR-CDF, they may impact its efiiciency. 

In the remainder of this paper, we describe the design, interface, and implementation of the 
MR-CDF library. The next section provides an overview of the features of MR-CDF. Sections 
3 and 4 describe the relationship between data compression and MR-CDF, and how multi- 
resolution data is stored into and retrieved from an MR-CDF archive. Finally, Section 5 describes 
its implementation, which uses CDF^g data storage and retrieval facilities. 

2 What Does MR-CDF Do? 

The MR-CDF library does not attempt to provide a solution to the entire scientific data man- 
agement problem. An MR-CDF archive is designed to hold a set of related, similarly organized 
scientific date, ^uch as a se of imaf^es or a stream of sensor data. The important task of or- 
ganizing 'Ttanaging multiple data sets is left to some type of meta< database, such as those 
described in [RoCa90, ShWaSS], and is beyond the scope of MR-CDF (and CDF). 

What MR-CDF does provide is a simple, abstract programming language interface to sci- 
entific data. MR-CDF extends the CDF interface to provide support for multi-resolution data 
sets. It has all of the capabilities of CDF for storage, retrieved, and organization of data, plus 
the following” 

• MR-CDF allows selected data to be retrieved several different levels of resolution. Lower- 
resolution data can be retrieved more quickly than higher- resolution, allowing applications 


103 




(a) F\ill Resolution (b) Low Resdutioa (c) Medium Resdution 

Figure 2: A Simple Time-Series Data Set 


to trade-off retrieval time for resolution. 

• Multi-resolution retrieval in MR-CDF is progressive. This means that once a low res<dution 
version of the data has been retrieved^ a higher resolution version of the same data can be 
retrie\'ed in less time than would be required to retrieve the higher resolution data from 
scratch. 


The multi-resoiution capability of MR-CDF makes it simple for application programs to 
select a resolution that is suitable for the task at hand. Progressive retrieval is well-suited 
to applications such as data browsing. For example, an image browsing program can provide 
access to many low-resolution images quickly. When an interesting image is found, a progressive 
retrieval capaoility allows ihe browser to provide a higher-resolution version of the interesting 
image without retrieving the information contained in the low-resolution image a second time. 

3*1 Multi-Resolution Data 

Figure 2(a) shows a plot of some time-series data representing a hypothetical measured quantity 
‘‘MEAS"’. Data of this type might be stored in an MR-CDF archive. For the purposes of 
this example, suppose that the MEAS variables is of the MR-CDF-defined floating-point type 
“REAL-4”. We will use this example to describe how multi-resolution data in MR-CDF is 
viewed by application programs. 

When a multi-resolution variable is created in an MR-CDF archive, the number of resolutions 
at which it can be made available is defined. Applications retrieve the values of multi-resolution 
variables exactly as they would a single-resolution variable, except the desired resolution level 
must be specified as well. A resolution level is specified as an integer betv^een zero and the the 


104 



number of resolutions defined for that vuriable^ minus one. Sm^dler numbers represent lower 
resolutions. 

Suppose that ^MEAS** is stored as a variable with three possible lesolutiors. If an application 
retrieved ^MEAS” at resolution zero, it might receive the data plotted in Figure 2(b). Data 
at resolution one might look as plotted in Figure 2(c), while the data at resolution two would 
match the full-resolution data in Figure 2(a) exactly. 

An important feature of MR-CDF is that the application will receive the same volume and 
type of data, regardless of the resolution level requested. In our example, the application can 
expect to receive ^en REAL-4 values, regardless of resolution. This greatly simplifies data 
handling in MR-CDF applications. From the application’s perspective, the advantage of lower 
resolution data is that MR-CDF can provide it more quickly. Although the volume of data 
passed to the application is independent of the resolution, MR-CDF needs to retrieve less data 
from its archive to produce the lower resolutions. 

3 Producing Data for MR-CDF 

The low resolution data in Figure 2(b) were obtained by averaging groups of four values from 
the ori^nal series (Figure 2(a)) and replacing the values in each group by their average. This 
averaging procedure is a form of lossy data compression^ since the low resolution series can be 
represented using a quarter of the values required for the original series. MR-CDF is specifically 
designed to manage multi-resolution data that is produced by applying lossy compression to the 
full-resolution data. Of course, there are many more effective and sophisticated compression 
techniques than the averaging procedure used in the examnle. 

Data compression is not performed by the MR-CDF library. Instead, it is assumed that 
the compressed data is produced externally and then stored in the MR-CDF archive. MR- 
CDF performs the decompression and combination of the stored data in response to application 
requests. 

In principle, it would be possible for compression to be implemented within MR-CDF. In 
pr«^^tice, however, compression of the data is often much more time consuming than decom- 
pression. Many compression algorithms are best performed on highly parallel machines or with 
special purpose hardware. ^ 

^For example, compressing data using vector quantization involves vectorizing the input data and comparing 
each vector against > ^codeDook” of vectors to find the closest match. Using parallel hardware, the input vector 
can be compared against all of the codebook entries simultaneously. Decompressing the data involves much less 
work, since only a simple lookup in the codebook is all that is required to recreate each vector. 


105 





Figure 3: Creating Data for MR-CDF 

Figure 3 illustrates how data suitable for progressive, multi-resolution retrieval is produced 
and stored in MR-CDF. An iterative compression technique (described below) is applied to the 
original full- resolution data, resulting in several stages oi compressed data The compressed 
data a: e stored in the MR-CDF archive. The stages are such that MR-CDF will be able to 
recreate the data at resolution level t by retrieving stages 0 through i — 1 from the archive and 
then decompressing and recombining them. We will describe the retrieval procedure in more 
detail shortly. 

The iterative compression algorithm shown in the figure actually represents a general class of 
compression procedures. During each iteration, data is compressed using some lossy compression 
technique, and then decompressed. The difference between the decompressed data and the 
original is computed. This difference, or error, becomes the data that is compressed during the 
next iteration. 

The iterative compression procedure for computing three stages of compressed data is il- 
lustrated in more detail in Figurp 4. As illustrated, a lossy compression function /, is used 
to produce the stage-i data. Since the compression function is lossy, the decompressed data 
will not match the original data exactly. The difference between the original data Z?o and the 
decompressed data //(/i(A)) Is the error (residual) data, which is used as the input to the 
next decompression stage. In the figure, the shaded boxes represent the compressed data stages 
which are actually stored in MR-CDF. 

The example in Figure 4 may oe somewhat misleading since it suggests that the compressed 


106 







data stages together occupy more space than does the original, full^resolution data set. In 
practice, this need not be the case. For more realistic compression techniques, such as those 
described in [TiMa90, Ti89], the stages taken together are about as voluminous the original 
data. The compressed stages can be thought of as an alternative representation of the original 
data which makes multi-resolution retrieval more convenient. 


4 Retrieving Data from MR-CDF 

When an application requests data from MR-CDF, it specifies a desired resolution level. MR- 
CDF supplies the data at the specified resolution by retrieving one or more of the compressed 
data stages from the archive. To retrieve data a resolution i, stages C through t are retrieved. 
The retrieved stages are then decompressed and combined to produce the desired data. 

Figure 5 illustrates how MR-CDF would handle a request to retrieve the data compressed 
as illustrated in Figure 4 at i ,»olution level two. (In this case, resolution level two corresponds 
to the original, full-resolution data.) Since resolution level two is requested, MR-CDF retrieves 
the stage-0, stage-1, and stage-2 compressed data. The first two stages are decompressed, and 
the resulting data is additively merged into a single buffer. In this case, the buffer will contain 
an exact recreation of the original data Dq. 

If a lower resolution level is specified, MR-CDF need only retrieve and decompress some of the 
staiges. For example, for resol;/ ion level 0, only the stage-0 data is retrieved and decompressed. 


107 








BItfSM 



Figure 5: Decompression Procedure - Three Stages 


4.1 Dacomi'^ession EYinctions 

MR-CDF’s retrieval procedure requires that a set of decompression functions be available. Since 
an arbitrary compression function can be used to produce the stages, MR-CDF must be informed 
of the proper decompression function to apply at the time of retrieval. When an application 
stores compressed data in MR-CDF, it is required to register an appropriate decompression 
function with the library, is was illustrated in Figure 3. 

A decompression function is an arbitrary procedure which accepts a set of parameters sup- 
plied by MR-CDF. These parameters include pointers to the source buffer holding the com- 
pressed data and a target buffer into which the deco?npressed data is to be placed. Additional 
information, such as the sizes of the buffers and their data types is also provided. 

Each time a new multi-resolution variable is dc ined in MR-CDF, the names of the decom- 
pression functions to be used for each compressed data stage must also be supplied. Every 
decompression function is registered under a particular name. New variables that use the same 
decompression functions as existing variables may refer to those functions by name. 

A decompression function defines a mapping from compressed data to decompressed data. 
In many cases, it is most convenient to implement the function as a generic piece of code, plus 
some additional data. For example, vector-quantized data can be decompressed by a simple 
function which looks-up each code word in a codebook. Changing the codebook changes the 
decompression function that is being implemented, but the generic code itself need not be 


108 








Figure 6: Implementing MR-CDF with CDF 


modified. 

Since this is a common occurren'”'' MR-CDF aJlows auxiliary data to be stored with each 
stage. At decompression time, both the compressed stage data and the auxiliary data are 
supplied to the decompression function. The advantage cf auxiliary data is that common, 
generic decompression functions need only be registered once with MR-CDF. 

5 Implementation 

To an application, MR-CDF provides a superset of the services provided by CDF. MR-CDF is 
also implemented using CDF. All data storage and retrieval is performed by CDF. MR-CDF acts 
as a coordinator between a group of CDF archives and the application-specified decompression 
procedures. 

Each MR-CDF archive is implemented a set of CDF archives. Specifically, a MR-CDF 
archive is implemented by a single hose CDF plus a set of stage CDFs for storing the compressed 
data stages. There is a stage CDF for each stage of every multi-resolution variable defined in 
the archive. This is ’Mustrated (for an archive with a single multi-resolution variable) in Figure 
6 . 

An MR-CDF archive may contain i. ruix of single-resolution and multi-resolution variables. 
Single-resolution variables are implemented directly in tiie base CDF. Requests to store and 
retrieve such variables are translated into appropriate CDF calls on the ba^? CDF. In addition, 


109 











the base CDF maintains gh»b;»l information about the MR-CDF archive such as Vie number 
of defined variables and their names. It also maintains general information about the multi< 
resolution variables in the arch we. 

When MR-CDF receives a request to retrieve a multi-resolution variable) ihe following steps 
occur. MR-CDF first ^ ^.rieves general information about the variable (such as the number of 
stages that are available £^d their sizes) from the base CDF. Using this information, MR-CDF 
then translates its request to a series of retrieval requests on the st^^e CDF’s. 

As data is «*etrieved from each stc^e, it is placed in a holding buffer and then passed through 
the appropriate decompression function. (Auxiliary decompression information is stored a^ at- 
tributes of the stage CDFs and is retrieved using the attribute/ value manipulation facility pro- 
vided by the CDF library.) The decompressed data is then trimmed and merged with data Lrom 
the other stages in the merge buffer. To avoid unnecessary copying of data, the decomt \ 
and trimmed stages are merged directly into the application’s buffer. 

MR-CDF runs on UNIX systems for which CDF is supported. Currently, only the C langc^age 
interface is available. Since UNIX does not provide a run-time linking facility, it is not possible 
to define new decompression functions to the MR-CDF library without recompiling it. (New 
multi-resolution variables using existing decompression functions can be added at any time.) 
However, the procedure for adding new decompression functions is very simple. 

6 Conclusion 

MR-CDF provides an abstract interface to multi-resolution scientific data. Its program interface 
allows applications to define, store, select, and retrieve data. MR-CDF can make lower resolution 
data available quickly, allowing auplications to trade off resolution for rt«.rieval time. 

MR-CDF h implemented using NASA’s CDF (Common Data Format) library and runs on 
any UNIX syr^^jm supported by CDF. Existing CDF applications can use MR-CDF with minimal 
modifications. 

MR-CDF stores multi-resolution data as a series of compressed data stages which can be 
decompressed and combined to produce the data at different resolutions. Retrieval of compressed 
data introduces a tradeoff between I/C .osts and processing costs. Compression reduces the 
volume of stored data, and therefore the I/O cost for its retrieval. However, the decompression 
and recombination of the data introduces processing overhead. Technological trends suggest 
such tradeoffs will become more beneficial with time. The performance of processors continues 
to improve rapidly, while access times for I/O devices have changed little. 


no 



Since CDF utilizes the UNIX file system, distributed operation of the MR-CDF library is 

poss'ble among machines with access to a common file system, such aw> NFS. We are currently 

planning a distributed version of MR-CDF for byct'^ms which do not share files. 

Acknowledgements 

Thanks to M. Manohar for several helpful discussions, and for providing test data for MR-CDF. 

References 

[RoCa90] Roelofs, L. H., and W. J. Campbell, “Using Expert Systems to Implement a Semantic 
Data Model of a Large Mass Store System”, Telematics and Informatics, 7, 3/4, 1990, 
pp. 361-377. 

[SuWaSS] Short, N., Jr., a:. ! S. L. Wattawa, “The Second Generation Intelligent User Interface 
for the ustal Dynamics Data Information System” , Telematics and Informatics, 
5, 3, 1983, pp. 253-268. 

[T>90] TYeinish, L., “The Role of Data Management in Discipline- Independent Data Visu- 
alization,” SPIE/SPSE Symposium on Electronic Imaging Science and Technology, 
February, 1990. 

[TrGo9U] Treinish, L., and M. Gough, “A Software Package foi the Data-Independent Man- 
agement of Multidimensional Data,”, Eos, 68, 28, July, 1987, pp. 633-635. 

[TiMa90] Tilton, J. C., and M. Manohar, “Hierarchical Data Compvession: Integrated Browse, 
Moderate Loss, and Lossless Levels of Data Compress.on,” Ptvc, International Geo- 
science and Remote Sensing Symposium, May, 1990, pp. 1655-1658. 

[Ti89] Tilton, J. C., “Image Seg-^.entation by Iterative Parallel Region Growing and S’^dit- 
ting,” Prvc, Intematioual Geoscience and Remote Sensing Symposium, May, 1989, 
pp. 2420-2423. 


Ill 




N93-80457 


B^h-P eif o n nmce Maw Storage Syatem for Woiketatioiis 

Pf 

T. Chlillii T. Teaf. L. I 




3 - 82 ^ 


Loral AeroSys 
7373 Executive Place 
Suite 101 

Scabrook. Maryland 20706 


ABSTRACT 



Reduced Instruction Set Computer (RISC) workstatkros and Personnel Computers (PC) are very 
popular tools for office automatton. command and control, scientific analysis, database 
management, aixl mail/ other applications. However, when using Input/Output (I/O) intensive 
applications, the RISC woitetations and PCs are often overburdened with the tasks of 
staging, storing and distributing data. Also, by using standard high-perfonnance 
peripherals and storage devices, the I/O function can still be a common bottleneck process. 
Therefore, the high-performance mass storage system, developed by Loral AeroSys' 
Independent Research and Development (IR&D) engineers, can offload a RISC workstation of 
I/O related functions and provide high-performance I/O functions and external Interfaces. 

The high-performance mass storage system I^as the capabilities to ingest high-speed real-time 
data, perform signal or image processing, and stage, archive, and distribute the data. This 
mass storage system uses a hierarchical storage structure, thus reducing the total data storage 
cost, while maintaining high-I/O performance. 

The high-performance mass storage system is a network of low-cost parallel processors and 
storage devices. The nodes in the network have special I/O functions such as: SCSI controller. 
Ethernet controller, gateway controller. RS232 controller. IEEE488 controller, and 
dlgltal/analog converter. The nodes are intercoimected through high-speed direct memory 
access links to form a network. The topology of the network is easily reconflgurable to 
maxlralze system throughput for various applications. This high-performance mass storage 
system takes advantage of a Tiusless" architecture for maximum expandablUfy. 

The mass storage system consists of magnetic disks, a WORM optical disk Jukebox, and an 8- 
nrinn helical scan tape to form a hierarchical sto<age structure. Commonly used files are kept in 
the magnetic disk for fast retrteval. The optical disks are used as archive media, and the tapes 
arc used as backup media. The storage ^tem is managed by the IEEE mass storage refereiKe 
model-based UnfTree software pack^e. UnlTree soft^re will keep track of all files in the 
system, will automatically migrate the lesser used flies to archive media, and will stage the 
ffles when needed by the s>j stem. Tlie user can access the flies without knowledge of their 
physical location. 

The high-performance mass storage system developed by Loral AeroSys will signlflcantly 
boost the system I/O performance and reduce the overall data storrge cost. This storage system 
provides a highly flexible and cost-effective architecture for a variety of applications (e.g., real- 
time data acquisition with a signal and image processing requirement, luiig-term data 
archiving and distribution, and image analysis and enhancement). 


1. INTRODUCTION 

RISC woiicstations and PCs are frequently used for office automation, command and control, 
scientific analysis, database management and many other applications. However. RISC 
workstations and PCs usually suffer the following drawbacks: 


113 





1.1 Lack of Cost Effective High-Peifonnance Storage Capabilities 

Due to the increase of add-on board applications Sex' workstations or PCs. the demands for coat 
effective hlgh-porfonnance sto* ' capabilities also Increase. As a result, the hierarchical 
storage archlteciure becomes r. jsaiy for many workstaticm and PC applicatlans. 

1.2 Lack of Data Acquisition Capabilities 

For some applications, data is received simultaneously from multiple sources through 
different I/O controllers. Standalone workstations or PCs usually have limited I/O 
controllers, ports, and I/O bandwidth that can handle large volume composite data streams. 
Also, a woilatatlon having different parallel I/O controllers can be a critical issue. 

1.3 Lack of Processing Power for Computing Intensive i^pUcations 

RISC workstations and PCs used as general purpose computers usuaB> are inefficient for such 
Intensive computing applications as numerical analysis, image processing, and signal 
processing. For example, data compression. Image enhancement and data formatting all 
require extensive computing power. Therefore, additional computing power can be useful for 
some workstation and PC applications. 


2. OBJBCnVB 

The objective of this research is to develop a cost-effective mass storage system prototype that 
will provide cost-effective, unlimited stor^e space for the workstrtlons and PCs and offload 
data acquisition, storage management, and intensive computing functions from the 
workstations and PCs. 


3. APPROACH 

The high-performance mass storage system prototype consists of modular. Interchangeable 
hardware and software building blocks. The system's building blocks are developed u^ng 
Commerclal-Off-the-Shelf (COTS) products where possible. This system's prototype is 
implemented in an open environment, using a Unix operating system and X-windows. This 
structure Is an optimum solution for a multi-vendor environment. The hierarchical storage 
structure Is used to provide cost-effective storage media: and the massive parallel processing 
system is used to perform the scalable I/O and data processing caoabilUles. Lor^ AeroSys’ 
mass storage system design can off load I/O and intensive computing functions form 
workstations ai .1 PCs. 

3. 1 Hierai chlcal Storage Structure 

Loral AeroSys' mass storage system prototjrpe provides automatic migration based on user- 
supplied criteria. The storage management software is designed to track the physical locations 
of files, which is transparent to the user. The migration criteria can be adjusted to provide an 
efficient and cost effective solution to a specific applicat'on 

3.2 Massive Parallel Processing System 

The Multi-Instruction Multi-Data (MIMD) parallel system, which provides 'calablc processing 
power, is used to perform the storage management, data acquisition and other computing 
intensive functions for the high-performance mass storage system prototype. Loral AeroSys' 
system protr*ype consists of arrays of I/O controllers and processors; and It can receive, 
process, store, retrieve, and distribute data streams in parallel to achieving maximum 
performance. 


114 



The mmm Storage system can be conOgured to be a network server providing services to aD the 
workstations and on the network, or a dedicated I/O processor for one workstation or PC 
through a dedicated link to receive a foster archive rate. 


«. MAaS8TC«AQB8TSmfFIIOXOTYFB 

Loral AeroSys is current^ buflding a mass storage syston prototype baaed on the jHcviouj^ 
mentioned design approach. The hlgh-perfomiance mass storage ^stem bulkUng block 
configuration Is shown In Figure 1. 

4.1 Hierarchical Storage Media Configuration 

Frequently used files are stored in a disk array of three magnetic disks of 600 Megabytes each. 
The disk arr^ is connected to the parallel processor throu^ an SCSI bus. The archive files 
are stored m an optical Juk^MK with two WORM drives. The Juk^MK can house up to 25 (5.25^ 
platters with a toUd storage capacity of 16 Gl^es. When the Jukebox fo full, the platter can be 
manually moved offline, and the prototype can still track the files. An 8-mm helical scan tape 
^stem is used to provide a system log. backup files, and distribute files. The WCXtM drives a^ 
tape drive are connected to a second SCSI bus. 

4.2 Massive Parallel Processors 

The Parsec mult*cluster parallel processor system is used to host the mass storage device 
and to provide parallel processing capability. Loral AeroSys' s]rstem prototype has a low-cost 
Inmos T800 transputer chip, rated at 25 MIPs. Four processor boards are used for the Initial 
mass storage system configuration. The four boards are: a root processor board with 32 Mbyte 
memory, two SCSI contro&r boards with four Ml^e memory each, and Ethernet beard with 2 
Ml^e memory, and an RS232 daughter board. Efoch board has one T800 chip and four high* 
speed links to form a parallel processor network. The Parsytec Multicluster system runs a 
Unix-compatible operating system (Hellos). 

The Parsytec Multkduster system is also hosted by a 386 PC through an interfoce board. The 
PC provides the user Interface to the Parsytec Multicluster system. The PC runs MS-DOS 
operating ^tem. 

Unfrree software is used to manage the files and storage media. UnfTree is implemented based 
on the IEEE mass storage reference model; . t maintains a standard Unlx-style directory 
structure, and provides automatic migration and network access functions. A Cy^et JukebcK 
Interface Management Software (JIMS) was Integrated Into the UnlTree to provide WORM 
platter mount and urunount functions. 


S. PROGRESS OP THE PROTOTYPE ISVELOPMENT 

The integration of storage devices to the parallel processor network, and the porting of JIMS to 
the parallel processor network are completed. 

During porting efforts, some major problems were encountered. The most serious problem '/as 
analyziiig the difference between the different Unix operating systems (e.g.. System V intcr- 
processor communication package used by JIMS, and Smile task scheduling package used by 
UnlTree). System V and UnfTree Smile are not supported by the Hellos operating system used 
by parallel processing system. The solution was to Implement the System V' capabilities and 
pore the UnfTree Smile packrge to the parallel processor. Currently, all major problems are 
resolved. 


115 




L0nM.MM6»49««i 


Flgum 1. Stotage System Prototype 


116 






The success of integrating storage devices and porting <JIMS indicates that it is feasible to 
build a cost-effective parallel I/O and data processing mass storage system for workstations 
and PCs. Plans are In place to perform benchmark testings for different applications and to 
add rewritable optical Ji^boxes and 8-mm tape library system. 


e. CONCUBXXf 

Loral AeroSys IRfltD's mass storage system prototype is a high-perfannance. flexible storage 
management system using an inexpensive implementation of the Unfltee flic. This mass 
storage system provides hierarchical file and storage management lor networired, multl- 
vetxlor computer environments, and provides complete appllcathm tran^iarency in an open 
environment. 


117 




N93-80458 

GE Netwoiked Mass Storage Solntioiis Sopportiiig 
IEEE Network Mass Storage Modd 

Donald Herzog 
(X Aerospace 

Government Conunmilcations Systems Department 
Camden. NJ 08102 


PwraSlOTC Magg Staraft Syb-SYStem 

The General Electric Government Communications Systems Department (GE/GCSD) has 
developed a near real time digital data storage and retrieval system that extends the 
capabilities current^ available In tods^s marketplace. This system called DuraStore uses 
commercially available rotary tape drive technology with ANSI/IEEE standards for 
automated magnetic tape bas^tl data stmage. It uses a non proprietary approach to satisfy a 
wide range of data rates and storage capabilities requirements and is compliant with the IEEE 
Network Storage Model. 


Rotary Tape E>rlves.lBTDl 

The basic element of the system is the GE Rotary Magnetic Tape Drive (RTD) famlfy of drives. 
The drives use 19mm helical scan technology and implement both the ANSI ID-I standard for 
Instrumentation data recording techniques with a BER of lOE-10. and the ANSI DD-1 standard 
for storing and retrieving computer compatible data with a BER of lOE-13. The drives operate 
in both streaming and asynchronous modes and are capable of handling ID-1 and DD-1 data 
streams automatically within the same drive. 



Standard Interfaces 

The drives are designed to be controlled using the ANSI Intelligent Peripheral Interface (IPI-3) 
i- nrnmanri set. Elach drive has two interfaces to the usen one interface (low speed) is for set- 
up/control. the other (high speed) for actual data transmission/reception and redundant 
command control. Currently GE has implemented the following interfaces; 

A. Data Interface 

1. Physical 

- HlPPl 
-SCSI 

2. Control 

- IPl-3 Command Set 

B. Control Interface 


1. 

Physical 


- Ethernet 

2. 

Network 


- TCP-IP 

3. 

Control 


- IPI-3 Command Set . 



119 



ApoUcatton spcclfk: Hardware/Softaare 

The heart of the drive controDers are the tntemal buffer management hardware and the lPI-3 
mwwnanH processing software. The drive controller has a user side that Is easily modifiable at 
the factory to the desired data Interface (HIPPI. SCSI. FDDI. IPI, RTD, etc.). All the drives In the 
RTD famlty are upward compatible and the current maximum continuous throughput of the 
t(^ of the Une RTD-45 Is 50 M^es/sec streaming ID-1 mode and 45 MBytes/sec asynchronous 
In the DD-1 mode. The drive controller can handle burst rates of 70 MBytes/sec and has a 
maximum buffering capacity of 4 GBytes. 


Networked Automated Taoe Ubrarica 

Another ele n»ent of GE‘s DuraStore system are GE's Data Storage System (DSS) Automated 
Tape libraries fATL). These libraries are designed to relieve the user/host file manager of the 
physical resource management responsibilities for the library. Each library complex is 
controlled by an Automated Tape library controller. The library can be treated as a single 
logical device by the user/host. The library Controller is capable of controlling up to four 
Automated Tape Libraries simultaneously. Users communicate to the Library Controller via 
Eihemet/TCP-IP using the same IPI-3 command set used to control the Individual RTD drives. 
The library Controller maintains all the directory information necessary to translate a file 
request fiom a user to the correct tape cassette, then locate the cassette in its appropriate bin. 
load and position the tape in an available drive and notify the user that his data is available to 
be read/written. The ATL supports mixtures of ID- 1 and DD-1 volumes in the library. 


Ubrarv Administrator 

The library Administrator's interface to the ATL ^tems is through the Library Controller 
with a user friendly graphic display. In addition to the read/wrlte functions necessary for data 
storage/data retrieval, the Automated Tape Libraries support the following additional 
functionality available to the Library Administrator and to system users; 


1. Import - Enter Media and Files into Automated Tape library 

2. Export - Remove Media and Files from ATL 

3. Directory - Volume/File Listings for Library /Volumes 

4. Error Statistics - 

a. BER for Drives/Trend Analysis 

b. BER for Volumes/Trend Arialysis 

5. Full Directory Shadowing 

6. Library Diagnostics 

7. Resource Management 


wntc Protection 

The Automated Tape Ubraiy also allows the user to protect his Individual files/volumes. This 
protection is done at the volume and not the usr;r kvid. E^h individual user can select one of 
three levels of write protection: 

1. Write protection on entire volume - No more data can be written 
to that volume 

2. Write protect existing data onfy - Data is write protected as it is 
added to volume 

3. No write protection - Overwriting allowed 


Vdlume/Phvslcal Media Linkages 

The GE Automated Library supports/allows all the logical volume/physical media linkages 
supported by DD-l* 

1 Volume to 1 Cassette 
Multiple Volumes to 1 Cassette 
1 V(dume to Multiple Sequential Cassettes 
1 Volume to Multiple Parallel Cassettes (Striping) 


MaxIiTiiiiTi Resniime ntiUzatlnn 

The Library has been designed to maximize resource utilization. It monitors activity and will 
deaUocate/reallocate assigned equipment if no activity takes place for prolonged periods of 
time. The Library will alM automatically exercise various system diagnostics when errors 
tape place and automatically notify the Library Administrator of actions required to 
fix/further Isolate problems. 

The Automated Tape Libraries can support the following maximum configuration capabilities: 

I 5 RTDs per Library 

II 660 Me^um Cassettes (40 GBytes/cassette each) per Libraty 

III 5 million files in the data base 

IV 25 TeraBytes of data/Ubraiy 

The Ubraiy Controller, as a controller for 4 ATLs. will control: 

a. 20 RTDs max 

b. 20 million files in DB 

c. 100 TeraBytes of data across 4 Ubraries 


121 




N93-8045d 


high-speed data dupucation/data distribution • 
AN adjunct to the MASS STORAGE EQUATION 




Kerin Bowinl 
EKABYIB 
1685 38th Street 
Boulder. CO 80301 


// 


The tenn "mass storage" Invokes the image of huge on-slte disk and tape £aims which contain 
huge quantities of low- to medium- access data. Although the cost of such bulk storage Is 
recognized, the cost (tf the bulk distribution of this data rarely is given much attention. Mass 
data distribution becomes an even more aoite problem if the bulk data is part of a national or 
International system. IT the bulk data distribution Is to travel from one large data center to 
another large data center then llber-optlc cables or the use of satellite channels Is feasible. 
However, If the distribution must be disseminated from a central site to a number of much 
smaller, and. perhaps varying sites, then cost prohibits the use oS. fiber-optic cable or satellite 
commuidcatlon. Given these cost constraints much of the bulk distribution of data wlU 
continue to be disseminated vL*! Inexpensive magnetic tape using the various next day postal 
service options. 

For non-transmltted bulk data, our working hypotheses are that the desired duplication 
efficiency of the total bulk data should be established before selecting any particular data 
duplication ^tem; and. that *he data duplication algorithm should be determined before any 
bulk data duplication method is selected. 


Bafldlng the Tocds: 

In order to compare data duplication hardware and :^nous data duplication algorithms one 
must first build a suite of evaluation tools. There aie several parameters required to build such 
a tool suite. They arc; 


Burst Transfer Rate. 

Sustained Transfer Rate. 

Average Pick and Place Transport Velocity. 
Average Pick and Place Time. 

Load/Unload Time. 

Average Number of Megabytes Duplicated. 
Number of Pick and Place Mechanisms. 


123 


y q Boii caMBai 



The Burst Tranter Rate is how fast data can be moved into drives on board memoiy. The 
Sustained Transfer Rate is how fast data can be transferred from on board memoiy to the 
media. 

To compute the Average Pick and Place Time for a given piece of data duplication hardware we 
can use the fallowing equation: 

(equation li 


FT = 


2 E 

msl nsl 


(ICm-Dn l/Vavg) 


XYPp 

where; 

APPT = Average Pick and Place Time 

Cm = Location of cartridge number m 

Dji = Location of Driver number n 

^avg = Average velocity of the pick and place mechanism 

X = Total number cartridges 

Y = Total number of drives 

Pp = Total number of pick/place devices 

In order to compute the Average Load and Unload Ttme one can use the following equation: 

(equatlon2) 

X 

APPT =2 (Lm + Um)/2 
m>l 

X 


where: 

ALUT = Average Load and Unload Time 

Lm = Lt'ad time for a particular drive 


124 



U|n = Unload time for a purtlc'ilar drive 

X = Total number of drives 

The Load Time Is the total amount of time It u >es after Inserting media before It can be read 
from or written to. The unload time is the total unount of time it takes before the meoia can be 
removed after it is requested. 

To duplicate at the maximum speed one must minimize the time not spent writing the data: 
that Is. minimize the time loading, and unloading, as well as picking and placing the cartridges. 
Minimizing the load/unload time can most easily be accomplished by duplk atlng the largest 
possible flies. The relationship between the total duplication time, load/uidoad . pick and 
place time, and the time spent actu;....ly writing data Is given below: 

(equation 91 

Total time = (load/unload time) + (pick/place time) + (write time) 

Assuming that the total time is flxed and that then is ample time to accomplish all item«. 
vaiylng the time of any one of the other terms will decrease the percentiige of total tlm.* used by 
the other terms. Therefore, given the above constraints, increasing the time writing data wUl 
act to mi nimiz e the percentage of time spent loading/unloading and picking/ placing. An 
example of how this helps us Is given below: 

Questions: 

What Is the total time required to give the minimum and maximum file sizes for an 
Exabyte LXB-120 cartildge handliiig system loaded 'vlth EIxabyte EIXB-SEOO :.ape drives? 

What are the minimum and maximum percentages of time spent v/ritt-'g data? 

To answer these questions, the following information is needed. 

Average load unload time = 50 seconds 
Average pick and place time = 20 seconds 
Sustained transfer ra ^; = .5 MB/second 

The minimum file size is given by the total time equaling 70 seconds: 

70 = 50 + 20 + 0 

Thus the file size must equal zero according to equation 2. 

The maximum file size is 5,000 megabytes so the write time is 10.000 seconds 
10.070 = 50 + 20+ 10.000 
Thus: 

The minimum time spent WTltlng = 0% 

The maximum time spent writing = 99% 


125 



Unfortunatdy tills does not u-. . ^ the more Important question, which Is: 

What Is the optimum file size for the EXB-120 using EXB-8500 drtves? 

order to answer the above question more Information is required. The (qiUmum ^le'S^e^ a 
function of optimum system throu^put. The optimum syster'i throughput depends upon using 
the drtves efllcicnUy. The first requirement to using the drtves efficlentfy is the need to 
minimize the time spent not writing. The second requirement is the ne^ to keep as many 
drives as possible streaming for as long as possible. In order to iQaximlze the time spent 
writing, one must overlap the lime spent plckl^ end idacing. Overlapping the pick/place thne 
is a function of the number of drives and the number of pick/place devices. A way to compare 
the effectiveness of various duplication systems in maxin'idng the time ^;>ent writing ts to 
multipfy the minimum of the ratio of the pick/place devices the average plck/place time 
plus the load/urtload time. As was discussed in equation 2 this will maximize the writing time. 
This relationship is given below: 

(equation^ 

» 

MEDU = Min ( (Dj / Pi) • (APPTi + ALUTi). 


(D2 / P2) • CAPPT2 + AUIT2). 


(Dp / PjJ • (APPTn + ALUTn)) 
where: 


MEDU = Most efficient drive usage 

Min = The minimum function, this function 

selects the smallest value in a list of values 


Di...n 

**l...n 

APPTi...„ 

ALUTi .n 


Number of dit/es In a particular 
duplication ^stem 

Number of pick/place devices in a 
particular duplication system 

The average pick/place time for a 
particular duplication system 

The average load/unload time for a 
particular duplication system. 


To keep the highest number of drives streaming simultaneously for any duplicatkm system we 
must measure the sequential load/unload efiiciency. That Is. if there are more drives than 
ptck/place devices the ratio cetween drives and pick/place devices rcpresent.s the need for 
sequential picks and places. The efficiency of the these sequential activities can be measured 
as below. Please note that this is a measure of the efficiency of the pick/placc algoilthm and 
the raw speed of the hardware, whereas equation 4 is only a measure of the speed of the 
hardware. 


126 



The general equation for the system peifonnance Is; 

(equations) 

Performance = ( 1 - ( TOTAL NON-WKITING TIME / 

TOTAL DUPUCATION TIME ) | • writing speed 

Please note that this performance measure Is the Inverse of the percentage of the total 
duplication time spent not writing data. The larger this number is the better the performance. 

To bound this performance measure we first give the equation adiich defines the worst relative 
performance. The true worst case is no data being \iTltten: however this is unreasonable. 
Therefore, we will define the worst case to be the combination of all sequential drives being 
stopped and then sequentially reloaded, that is; 

(equations) 


X 


CYC = 



2 • APPT„j 


+ ALUT+ P 


nisi 


# of Cycles = 


Total data written 


Data set size • # of sequential drives • # pick/place devices 


TOTAL DUPUCATION TIME = 

(Total data written / Sustained transfer ra»c) + 

(# of Cycles • (APPT + ALUT + PI j 

The worst case performance measure uses the following template; 

Performance = total writing time / total duplication time • duplication speed 

= (1 - (Non-writing time / total duplication tlmell * 
duplication sped 

= 11- ((total reoccurring non-writing time + 

Initial non-wrulng time) / total duplication time)] * duplication speed 


l'i7 



Therefore, the case perfonnance measure is given by; 
X 


INIT-- 



APPTo 


ObI 


# of Cydes 

PsrfiMmaiicew = 1 1 * (( CVCn + INIT) / 
n-l 

TOTAL DUPUCATION TIME ) ! • DS * PA 


Where; 



DS 

= 

Sustained drive transfer rate 

PA 

= 

# of drives able to write in parallel 

CYC 

= 

Amount of non-writing time used per csrcle 

INIT 

= 

Initial pick and place time 

Perfonnancew 


Su. tabled duplication percentage of 
non-writing time worst case 

X 

- 

The number of drives 

# of Cycles 

= 

The number of pick/place cycles 
required to -omplete the duplication 

P 

= 

Pause between each pick/place cycle 


Next we give the equation which defines the best possible relative performance; 

(equatior. 7) 

X 

INIT = APPTq 

obI 


128 



Performanccb - 


i of Cfdta 





(APPTn + ALUTn + Pn)) + INrD / 


n>l 


TOTAL DUPUCATION TIME )| • DS • PA 


where: 

INIT 

X 

Perfonnanceb 


IniUa' start-up time 

Number of drives 

Sustained duplication percentage 
of non-writing time best case 


The pause term of equations 5 and 6 can be non-zero under a variety of chcumstances. One 
such clrcun'«stance is when the duplicated data set size is too small to keep all sequential drives 
streaming. 

To compute the best Oxed data set size for duplication purposes, we use the following equation: 


(equatkmSi 


X-i 


Data set size = 



( APPTn, + (2 • APFi ) ) * Minis, b) 


m*l 


Where: 

X = The total number of Drives 
s = Sustained transfer rale 
b = Burst transfer rate 

Equation 8 represents a good first approximation of the data set size needed to maximize the 
data duplication effort. As the data set size increases from equation 8 the duplication effort 
performance approaches the limit set by equation 7. 

The final tool allows us to understand the relative cost and benefits of a dupiicatlon system. 
First we define the total system cost. 


129 



TCsfiU 

m^l 

Dto ♦ (Wni • Cn^ + PERm ♦ ♦ 6 • n ♦ P + E 4 00 

Where: 



TC 

= 

Total duplication system cost for the 
life of the duplication equipment 


= 

Amortization of duplication equipment per 
duplication 

Wm 


Media weight per duplication 

c 


Cost of transportation per duplication 

PERm 

= 

Personnel cost per duplication 

MEDm 

S 

Media cost per duplication 

S 

= 

Size of the foot print of the duplication 
equipment 

F 

= 

Floor space cost 

P 

= 

Total price of the duplication equipment 

E 

= 

Total electrical cost 

00 


Total coolin'* cost for tlie duplication equipment 


The system beneAts where defined In equation eight using the best case rating. Therefore the 
cost benefit ratio for a duplication system is given below: 

(equation iQf 

CBR = TC / PcrformarKreb 
Where: 


OBR 


lost benefit ratio 



C^tkfeToolK 

We now have the tools to exiswer the question posed eailier that is. what Is the opUmum file 
siae fw the EXB-120 using EXB-8500 tape drives? First we restate the question as two 
equivalent questions: 

1 What is the minlmura data set size needed to keep an EXB-120 (with EXBrSSOO 
drives) streaming the greatest amount of the time? 

2 What is the largest file size which can fit on an EXB-8500 tape? 

The answer to the second question is the easiest, apprtnnmat^ 5 gigabytes. The answer to the 
first question is computed as follows: 


APFT = 20 seconds 
Number of sequential drives s 4 
Internal drive buffer size = 1 megabyte 
Burst rate & 1.5 Bffl/seoond 
Sustained rate = .5 MB/second 
Mintmum Data set size = 50 megabytes 
Therefore: 

The best duplication data set sizes are greater than or equal to 50 megabytes and less 
than or equal to 5,000 megabytes. 

We can also answer these questions: 

• What is the best performance for an EXB-120 with 8500 drives in the data 
duplication application? 

- What is the worst reasonable performance for an EXB-120 with 8500 drives in the 
data duplication application? 

To answer the first question we need the foQowtng information: 


APPT 

= 

20 seconds 

ALUT 


50 seconds 

Pause 

= 

0 seconds 

INIT 

= 

80 seconds 

Total data written 


4000 MB 

Total duplication time 


9400 seconds 


131 



DS 

= 

.5 MB/second 

PA 

= 

4 drives 

f of Cycles 

= 

20 cycles 

Therefore: 



Performanceb 

= 

1.68 MB/second duplication rate 

To answer the secoiul question we need the following mfonnatkm: 

Cyc 

= 

210 seconds 

Therefore: 



Peiformanccw 

= 

l.OP MB/second duplication rate 

If we replace the EIXB-SSOO drives with their equivaleiit half height versions vidiat would be the 
best perfonnance? The correct data set size should be BifB; however, we kept the data set 

size at 50 MB so that we can compare equivalent situations. 

APPT 

= 

20 seconds 

ALUT 

= 

50 seconds 

Pause 

= 

Oseconds 

INIT 

= 

IGO seconds 

Total data written 

= 

4000 MB 

Total duplication time 

= 

1700 seconds 

# of Cycles 

= 

10 cycles 

DS 

= 

.5 MB/second 

PA 

= 

8 drives 

Therefore: 



ir'erfcrmanceb 

= 

1.98 MB/second 

In order for us to be able to compare devices with the same number of drives, we will now 
compute the performance of two EJCB-120S with EXB-8500 drives. This data set size should be 
SOME 

APPT 

= 

20 seconds 

ALUT 

= 

50 seconds 

Pause 

3 ; 

0 seconds 


132 



IN IT 

= 

80 seconds 

Total data written 

= 

4000 MB 

Total duplication time 

rr 

1600secoivls 

# 01 Cycles 

= 

10 cycles 

DS 


.5 MB/second 

PA 

= 

8 drives 

Therefore: 



Performanceb 

= 

2.3 MB/second 


We see that two EIXB-120 with EXB-8500 drives perform the data duplication task better than 
one EIXEi-120 with the equivalent half height <h1ves. The natural next question would be. 
which data duplication equipment gives me the best cost/beneflt ratio. This question is 
answered below. 

Because of the similar nature of the equipment being compared, we can make several 
simplifying assumptions bdcre calculating the total cost. These assumptions are giver: below: 

- The amortization cost would be approximately the same aixl therefore does not need to 
be included 

- The media weight and cost of transportation would be the sauie and therefore do not 
need to be tnciuued 

- Personnel costs would be the same and therefore do not need to be included 

- Floor space size dlUerence is negligible and therefore does not need to be Included 
• Energy cost diflerences are n^lglble and therefore do not need to be included 

- Cooling cost differences are negligible and therefore do not need to be included 
Pa exb-120 with EXB- 8 S 00 Dri«ta = 200.000 dollaTS 

CBR = 200000 / 2.3 

= 86.956 

I’exb-izo wuh h^ha^i dtiva = 100,000 dollais 
CBR = 100000/1.98 

= 50.505 


As can be seen given the above assumptions and the amount of data to be duplicated the better 
valu.: would be the single EIXB-120 with eight half height drives versus two EXB-I20s with full 
height drives. 


133 




iV98-80460 




The Fundamentale and Fatnres of B emor^e Mom Storage AltematiTea 

/5 f 


■jy 

Linda Kemvster. Pscsidenfc 
Stnteff c lluuufenicat Sesonicca, Ltd. 
6509 LIm Lane. Bunrle. Ifaryland 2^/720*4706 


nrrRODUCTiON 


This article reflects my view of how the storage products have been introduced into the 
marketplace, wdiere they came from, and where others win continue to come from in the ^Jture. 
My corporate goal Is to be a resource for those searching for removable solutions to mass 
storage problems. 


My Introduction to optical storage occurred a few months before signing a non-disclosure 
agreement with FlicNet on August 8. 1983. By 87 or 88. as the optical craze was getting more 
pt^ular. I started looking for similar or complementary storage technologies. 1 am sUU looking 
and research Is constant^ turning up new entrants into this field. Due to the scope o£ the 
coverage m this field, this article does n<^ dweU on ai^ single technology. The goal is to 
provide infoimatkm that Is not compiled in any other single source and focus on facts that are 
not commonly known. 


I have provided a few baseline assumptions to ensure the mathematical calculations remain 
consistent. 1/ Hard-copy 8.5" x 1 1" documents which are scaimed at 200 dots per inch (dpi) 
and compressed at a ratio of 10:1 result in a document image which requires an avera^ 50 
Kilobytes (KB) of storage. 2) An average A5CII page equhes 2 KB of storage. 3) An average file 
cabinet drawer can hold 2500 pieces of paper. 4) One GB of storage can hold an average of 
20.000 document images A reel of C250 tape holds 180 Megabytes (MB). 



UEXICAL SCAN TAPE CASSETTES 

Cassettes have become very familiar. Using helical scan technologies. Metrum Information 
Storage has developed a drive that can write 14.5 GB on a T- '.20 Super VHS cassette providing 
a storage media price of $.002 per MB. In storage equivalents, that roughly equates to : 

• 290.000 document images • 29 four-drawer file cabinets 

• 80 reels of 6250 tape • 7.250.000 ASCII documents 

An autochanger with a footprint of 21 square feet can hold 600 cassettes and provide an 
automated storage capacity of 8,700 GB or 8.7 Terabytes (TB) at a system cost of $0.06 per 
MB. Access to any Ole on a mounted tape is 45 seconds. 


jjoaaaDMiM 




135 




Sony, Hitachi and Ampex have developed three basic sizes of digital cassettes: small, medium 
and large. In the video world, there are two diflerent ways of recording data on these media. 
Basically. 0-1 technology refers to a video signal that is divided into three separate 
ccmponents. digitized then recorded onto tape. During playback, the three streams are output 
as three analog signals. This tape format meets instrumentation standard ANSI X3B.6. The D- 
2 technology is different in that alter the video signal is divided into three component signals, 
they are combined before digitization and recording onto tape. The output during playback is a 
combined anak^ signal. The D-2 video tapes are made by Hitachi. Sot^ Ampex. Fuji. Maxell. 
TDK and 3M. Taking advantage digital recording market opportunities and the availability of 
standardized media. RCA. C Systems. Ampex. and Soi^ have developed helical scan digital 
reccrders which provide the storage formats and data capacities listed below: 

• D-IS 16 GB • D-2S 25 GB 

• D-IM 44 GB • D-2M 75 GB 

• D-IL 100 GB • D-2L 165 GB 

Ampex currently has an autochanger available that will hold 25S D-2S cassettes for a total 
automated capacity of 6.4 TB. E-Sjrstems offers a 220-unit data tower that can provide 
autoniated access to 5.5 TB. Each of these units require less than 21 square feet. Multiple 
libraries can be linked together to provide even more roboUcally-addressable storage capiacity. 
Library tystems are in use today. In a standalone mode. RCA can simultaneous^ write to i. 2. 
3. or 4 D-1 cassettes at 400 Megabits per second (Mbps) each and reach a maximum of 

I '300 Gigabits per second (Gbps). 

Helical scan technology has also been applied to smaller tape formats. Several comparries. 
including Hewlett Packard, Sony. Hitachi. GigaTrend and Archive, have introduced a 4 
millimeter (mm) cassette that holds 2 GB of uncompressed data on 60 meters of tape. There 
are organizations using this comp< er-grade tape to replace COM (computer output microfilm) 
and master CD-ROMs. For automated applications, library units are available that manage up 
to 60 cassettes. Another member of the small cassette group is being offered by Exabyte. The 
8 mm format holds 5 GB of uncompressed data. Vendors such as Bull, IBM. Sun, NCR and 
Wang support this technology. Mass storage libraries have been developed that will hold up to 
432 cassettes providing up to 2.160 GB of roboUcally-addressable storage. 



136 




Ttic mainframe environment is served by two helical scan technologies from MASSTOR Their 
tape cartridge Is 6.5 inches square and 3 inches wide. The capacity Is almost 32 GB. The 
library unit for this format holds 32 cartridges and provides 1 TB. Thjy also provide a tube- 
shaped unit. 1.8 Inches In diameter and 3.4 inches long. The library unit which holds these 
350 MB units can accommodate 316 "tubes". The honey-comb shaped Interior of the library 
uses gravity to move the tape units into the readers. 



laCROPICBB 

A laser-based microfrche technology has been Introduced by IBASE Systems Corporation. This 
printing device allows a user to select images from an optical or magnetic storage unit and 
prmt them at either 200 or 300 dpi on microfiche . Using approved lUm-development methods 
can produce archtvable images and provide acceptable back-up optical storage. 



LONGITUDINAL RECORDING TAPE CARTRIDGES 

Carlisle Memory Products and 3M jointly developed the 1/4" c< ^trldge format. The cartridge Is 
4'‘ X 6" and can hold 2. 1 GB. The use of barium ferrite media in the quarter Inch cartridge 
(QIC) may support the storage of 35 GB by 1995. There are currently over 7 million QIC drives 
in use today. 

Digital Elquipment Company has over 300,000 tape drives using their standardized .5 Inch 
carti*ldge for storage. They have Introduced an external drive unit that uses the tapes for 
removable storage. The 4. 1 Inch square tape cartridges hold 2.6 GB. The next generation will 
hold 5.2 GB and by 1994, the capacity could reach 50 GB. A small library unity holds 7 tape 
cartridges. 

Storage Technology has developed and sold over 4,000 library units which house 6000 of the 
200-MB IBM 3480 cartridges and provide 1.2 Tb of robotically- addressable storage. The 
modular libraries have a footprint of 121 square feet and are large enough to allow a technician 


137 




to actually enter the unit to provide any service necessary. The libraries serve ISM and over i5 
non-IKiS platforms. Up to 16 libraries can be linked to ofler 19.2 TB of data storage. The next 
generation of cartridge, the 3490. offers a capacity of 400 MB with 2: 1 compre^>slon. After that. 
36 track tapes will be Introduced and the native capacity will reach 800 MB. Future develop- 
ments will support helical scan recording resulting in 20 GB per tape. The strategic oliectlon 
of the compaiy is to introduce systems more ir. line with an office 'environment. Uslr^ this 
same cartridge. Memorex./Telex hes tae capacity to manage over 1 .3 million tapes using a 
combination d* libraries. 

LaseiTape Systems has used the 3480 cartridge in conjunction with non-erasable distal paper, 
often called optical tape. By rutting digital paper into .5 inch wide strips 541 feet long, the 
company can store up to 100 GB In a single unit. Using the math presented In the previous 
paragraph, the library capacUy expands to 600 *IB in 121 square feet. Sixteen libraries would 
provide 9.600 TB or 9.6 Petabytes of robotlcally-addressable storage. At $250 per cartridge, 
the media price would be $.0025 per MB. 


\ OPTlcaref 



CARD-BASED TECHNOLOGIES 

Storage sjrstems that need to operate in mobile environments may use the low-cost option 
offered by numerous vendors supporting the 2.86 MB optical card. This credit-card sized 
optical storage media can store 1.430 ASCII text documents or 57 document Images using 
WORM technology on a media carrying a 10-year life expectancy. These cards are being used 
for personalized medical record storage. Within the year, a multi-layer phase change card may 
be Introduced tha' has the projected capacity of 1 GB. The entertainment industry could use 
this media to record up to 10 audio CDs or a full length movie. This revolutionary commercial 
introduction would truly Inspire the techofan! 


Chip cards can provide 8 KB of storage. The 8 bit microprocessor may support CPU. RAM. 
ROM or EEPROM functions. Memory-only chip cards cun provide 2 KB of nonvolatile data 
storage. 

Magnetic strip cards can be used as coin replacement units for mass transit operations or 
telephone services. The ~eusable cards are being used by USPS. NYNEIX. GTE and Canada 
Post. 


138 




Memory cards are coming In numerous formats com' lete with a variety of storage capacities 
ranging from 2 MB to 64 MB. These plug-ln memory formats are geared to serve the notebook 
or laptop Industry 



cm 


COMPACT DISC READ ONLY MEMORY 

One of thv east expensive media for mass distribution of reference-type database Information 
Is th 12 cm Compact Disc Read Only Memory (CD-ROM). A 650 MB disc can hold 13.000 
document images or 325.000 pages of ASCII text. An autochanger Is on the market titst can 
hold 250 discs and provide access to a networked library of published material. One vendor 
has introduced a single unit which houses 64 drives in a single cabinet. 

TTiere are emerging applications for this inexperslve media. Write once (Wl) drives are 
available to those who wish to store non-ci'esable information on inexpensive CDs. The drives 
to read these discs are much less experislve than other Wl drives. The CD Recordable media is 
being used to capture mainframe data and play It back on Inexpensive drives. The Photo CDs 
will soon be available In consumer photo development stores. The 250 million cameras in place 
today will provide the capture devices. The images will be dlsplayable on most TV screens. A 
technology has been reintroduced with the aimouncement of the 3.5 inch read only memory 
(O-ROM) disc. The capacity is 122 MB and because they are smaller and lighter, they spin ^ - 
9 times faster and provide faster access and seek times. Unlike the first introduction of CD 
media this size. It is in a protective case, similar to a fioppy cartridge. 



REWRITABLE DISKS 

One of the reasons that WORM disks were not readily accepted into the marketplace Is that 
data processing professionals did not like the permanence associated with the media. 
According to Dr. Robert Freese, magneto-optic (m/o) technology uses some principles of 
magnetic storage to store data on 5.25" (13 cm) optical disks. Slo^'age capacities range from 
400-1500 MB Tnese disks have found homes beside nigh capacity workstations and in 


139 


Jukebox eimrcnments. Mul' ^ jedla drives have been Litroduced by numerous vendors which 
will write to either 5.23'* WORM or erasable platters. 

The emerging format of erasable technology is the 3.5" m/o with the standardized capacity of 
123 MB. There are disks available with 256 MB capacity and projected capacities reach 520 
MB. These disks can spin faster and provide faster seek times. There are currently 18 vendors 
atmouncing this format and it Is expected that these disks will be more popular for desktops 
and notebooks, than the 5.25". 

Other 3.5 Inch media L.«. the barium ferrite disks with a capacity of 21 MB. At $25 per 
disk and under $300 for a drive, this is becoming an attractive media for smaller applications. 



12 ft 14 mCH OPTICAL MEDIA 

LMSI markets a revolutionary 12" WORM drive. The drive operates with a dual head so that 
each side of the disk is available to the user simultaneous^. Ihe user is provided with 5.6 GB 
of storage without the need to flip the disk in a standalone environment. To compliment this 
drive, the company ciTers a 5-platter magazine that fits as a single unit into a slightly different 
version of the new drive providing 28 GB of storage with a disk swap time of less than 3 
seconds. For 5$ecurlty. the magazine can easity be vaulted when necessary. The next 
generation of this system will provide 5 - 6 GB per side of each optical platter. Other i2" 
WORM drives have been introduced that will store up to 9 GB per platter. 

Kodak's latest 14" optical platter has the capacity of 10.2 GB and irKorporates a noii-erasable 
form of phase-change technology. Tne 100-platter Jukebox provides 1.02 TB of storage. 

Rewritable video disks offer dense storage for analog images. Access time between . -ames is 
less than 1 second. The disks hold up to 108.000 black and white or fu'l color Images (vs 
13.000 on a CD-ROM) and with the right adapter, the disk appears to the system as a Targe" 
CD. 


140 




DIGITAL PAPBR 

The last technology is one of the most exciting and versatile. Imagine a roll of aluminum foil 
that Is silver on one side, golden on the other, and very durable feeling. This is what ICI 
Imagedata calls Digital Paper. Cut into long strips and wrapped around a reel it has been 
referred to as optical tape. A 12.5 inch reel of this magic media can hold 1 TB of data and 
someday may hold up to 40 TB. According to figures firom The Sierra Club, the storage of 1 TB 
cf ASCn data on this media Instead of paper would save 42.500 trees. Metrum currently sells 
the CREO drive in the US. Canada now has three of these units running to store satelilte data. 
Australia has two systems doing the same thing. Other sales are pending throughout the world 
to support a variety of applications includir^ supcrcompuUng sites, oil data storage and 
medical imaging. 


© pending by Strategic Management Resouices, Ltd. All rights reserved. 




N93-80461 


THE NT DIGITAL MICRO TAPE EtZCORUEK 



10833 VifcBe j nrl e w Boaievart 
C^rprnB.Oi 00830 




DfTBfMlIICTIOiff 

The description of an audio recorder may at first glance seem out of place tn a conferenc. which 
has been dedicated to the Htenmairm of the technology and requirements of mass data storage. 
However there are several advanced features of the NT qrstem which will be of Interest to &e 
maM storage technologist. Moreover, there are a sufficient number of data storage fonnats In 
current use which have evolved from their audio counterparts to recommend a close attention 
to nuyor innovative introductions of audio storage ficamats. 

While the ^vIsht^ analog mlcro-cassctte recorder has been (and wiH continue to be) adequate 
for various uses. th«re are significant benefits to be gained through the application of digital 
techncdogy. The elimination of background tape hiss and the availability of two relative 
widdnod chaimels (for stereo recording), for example, would greatly enhance hstenabillty and 
speech intelUglbiltty. And with the use of advanced h|gh-denstty recording and LSI circuit 
technologies, a digital micro recorder can realize unprecedented compactness with excellent 
energy efficiency. 

This is what has been accomplished with the NT-1 Digital Micro Recorder. Its remarkal^ 
compact size contributes to its portability. The high-denslty NT format enables up to two 
hours of low-noise digital stereo recording on a cassette the size of a postage stanqi. Its hi^i^ 
energy-efficient mechanical and electrical design results in low power consumption: the unit 
can be operated up to 7 hours (for continuous recording) on a single AA alkaline battery. 
Advanced user conveniences include a multifunction LCD readout. The unit's compactness arid 
energy-effldency. in particular, are attributes that cannot be matched by exlsUng analog and 
digital audio formats. The size, performance, and features of the NT format are of benefit 
primarily to those who desire improved portability and audio quality in a personal memo 
product. 

The NT Recorder Is the result ot over ten years of intensive, multi-disciplinary research and 
development. What follows is a discussion of the technok^es that have made the NT possible: 

(1) NT format mechanics 

(2) NT media 

(3) NT drcultry and board 


NT MECHANICS 

In order to achieve the required high areal recording density, the NT format emplc^ the now- 
famillar rotary head double-azimuth helical-scanning system. The technique used in the NT 
fonnat. however, represents a significant departure from the rotary-head designs used In VCRs 
and DAT recorders. Specifically, the small size of the NT system has made it necessary to take 
entirely new approaches to loading and tracking. 

With conventional rotary-head ^tems. the trarisport must include a loading mechanism that 
withdraws the tape from the cassette shell and wraps It around a portion of the head drum. 
Beta. VHS. 8mm video, and DAT mechanisms all employ some variation of this technique, 
using either a IT or an loading pattern. Such mechanisms are necessarily complex as the 


143 





tape must be handled with great precision and care. These designs are also not space-efilclent 
because a significant volume in front of the cassette must always be set aside to permit the tape 
wrap. With conventirmal rotary-head ^tems. therefore, it is impossible to reduce size, 
weight, and cost beyond a certain point. Moreover, the nature of these tape-wrapping 
mechanisms Is su<^ that cassettes cannot be loaded or ^ected while the power Is off. 

The non-loading system emploj^ in the NT format is a novel solution to these proUems. As 
shown In Figure I. there Is no need to withdraw the tape from the sheU. Thpe wrap is Instead 
accompU&hed by msertirig the head drum assembty Into the front openltig of Uie cassette. Built 
into the cassette shd are molded tape guides vdilch serve the same function as the iixjined and 
vertical guides in conventional rotary-head systems. Pressure rollers, too. are an integral part 
of the shell, making the head drum and capstan the only external elements that need be 
engaged «tth the cassette. Since tape travel Is fully contained withm the cassette, the non- 
loading system provides the hlgh-denslty recording benefits of a rotary-head design srhllc 
preserving mu^ of the simplicity and space-efficiency of conventional fixed-head 

At the heart of the NT format is the non-tracking ^stem technology. (FfT Is derived from Non- 
Tracklng.) Achieving higher recording density entails shorter wavelengths, thinner tape, and 
narrower tracks. While advanced magnetic head, metal-evaporated tape, and signal 
modulation technologies (discussed belo«d all contribute to the attamirig of the NT format's 
very high record'og density, the narrow track tvidth requirement creates certain probleins. 
With such nai'->w tracks, read/write performance (data int^rlty) would be severely 
compromised by the slightest Imprecision in tracking. Unlt-to-unit compatlhillty would be 
difficult to ensure. These probtems are further compounded by the kmlied practical toierances 
In the NT cassette's built-in tape guides and the extremely ^ort length of exposed tape with 
which the noo-loadlng system must work. In (act. these circumstances make it Impossible to 
use conventional tracking schemes. 

These tracking iss'jcs were only able to be addressed 1^ abandonlrig the traditional approach 
based on hlgh-piedsion servo contrui. The non-tracking playback method employs double- 
scatuiing combined with high-speed memory to accurately read all of the recoded data. 
Because the NT does not refy on tracking precision, it eliminates the need fru* (heed control 
hea^ and automatic track finding signals and rirev'ts. making the entire system conslderabty 
simpler and smaller. 

In conventional rotary-head systems, there must be a one-to-one tracking correspondence 
between record and playback. That is. since two heads with opposite azimuths alternately lay 
down succe.«s*ve tr:.cks during record, each track must be traced at the same angle by the 
corresporiOlng head during playback. If this is not done precisely, mistracklng occurs and data 
are lost. With the non-tracking playback method, the one-to-cne tracking correspondence with 
the recorded tracks is intentlonaUy altered. The speed of the head drum rotation Is doubled, 
resulting in a double-density scan. Moreover, because the tape speed remains the same, the 
actual head trace path during playback is. in ^ect. further inclined. This is shown in Figure 2. 

This figure also shows how the non-tracking method can read all of the recorded data despite 
the skewed head trace. Consider the output frnm head A as it makes four scans at double ^>eed. 
Demodulating the RF output to digital signals results in data strips 1 through 4. ccrrespondlng 
to the four traces. (Note that although only 32 data blocks are shown in this illustration for 
simplicity, each .rack actually comprises 104 data blocks.) Focusing our attention on the data 
contained on track A . we see that each data strip 1 through 4 contains part of the Infotinatlon 
on the track with a certain amount cf overlap. Interspersed among track A's blocks are data 
from other tracks. Error blocks occur whenever head A tries to read data from a track recorded 
by head B because of the azimuth discrepancy. The four traces contain all the data blocks 
necessary to fully reconstruct track A. The information at this point, however, is out of 
sequence. 


144 



Hie out-of-aequeuce data are fed to a buffer inemoiy. By using raisdom access sequential 
reading, the data are compiled into the correct order. During recording, the data riunild be 
written with sequential memoiy. The output of the memory Is clocked by a quartz rcfierence 
oadUaUa'. then ern»‘*corrected prior to D/A conversion. The system recognizes the correct 
position In memory fcs- each data block. (Figure 4). Each data block actually consists of 
wTMillw sub-blocks, one of which contains an ^dress. Since each block, or piece of the Jigsaw 
puzde. is represented by a unique address that corresponds to unique position tar the Imfler 
memor/. (he pro cess is quite simple. 

The non-tracking method described abov ; depends for tts operation on the speed of the 
readback head being greater than that of the recording head: in this msy each read track 
intersected several written tracks. But changing drum rotation obviously does rxit allow fonr a 
smooth transition between record and playback. Also the servo would require some ^ledal 
function In order to track during ramp-up and down. However the same non-tracking 
operation can be achieved using a constant-velocity drum If the read/write track width ratio is 
a^usted so that two read passes are made for each write track. The readback ledundaixy thus 
obtamed can be used to reconstruct the written data In an analogous, abeit less Intuitive, 
fashion as can be seen &om study of figure 3. 

Figure 5 is a Uock diagram that shows the data flow aril signal processing Involved during NT 
format playback. 

Unlike ccnventlonal rotary-head systems, the NT format eliminates the need for tracking 
servo control. Incorrect tape speed, nevertheless, can cause a discrepancy bebveen the rate at 
whidi data are written tc memory and the rate at which th^ are read from memory, the latter 
being deteimlned by the quartz reference clock. Such discreparicles can cause memory 
overflow or underflow. Therefore, the NT format requires servo control to regulate tape speed. 
Figure 6 is a blodc diagram of the NT playback servo system. 

The address values from the playback data are read and compared to reference address values 
generated by the reference cloc'c. The difference between these address values must be kept 
reasonably constant In order to prevent memory overflow or uiuterflow. To ensure this, a 
phase error is derived by subtracting an offset ^ue from the address dlffererx^e value. A 
digital low pass Otter averat^es the phase error values over time, the gam Is adjusted . and the 
signal summed with the speed error comiionent obtained via the motor tachraneter. The 
resulting servo data are converted mto a PWM (pulse with modulation) signal. 

When the carrier component of the PWM signal is removed a low-pass filter, a motor drive 
voltage results. The entire servo loop works to keep the amount of data m the non-tracking 
memory buffer approximately constant. 

In the lum-loadlng system, approximately one-third of the head drum diameter is Inserted Into 
the cassette ^elL To make this p-;sslble. the diameter of the drum must be small, and the side 
of the assembly that is inserted mic the cassette must be extremely thm. Furthermore, because 
the rotary head assembly is Inserted at an angle, the top and bottom of the drum must be 
constructed with slanted cuts. These requirements have b^n met through the use of an ultra- 
high -precision miniature three-layered drum. The diameter of the drum is 14.8nun as 
specified by the NT format. The side of the assembly that Is inserted into the cassette Is only 
4nm thick. 

A highly sensitive head is necessary to ensure playback RF output because the track is narrow, 
the recording wavelength is short, and the relative speed is slow due to the small size of the 
drum. It Is also difficult to ensure head contact because the tape width is narrow and the tape 
tension is low. This problem can be resolved by using the MIG (Metal In Gap) head. Known as a 
double azimuth type, it maximizes extremity shape and alignment, and Is positioned on an 
ultra-sma!l platform. 


145 



Because of conskleraUons of RF characteristics. efDdcncy and Id jced noise, the rotary 
transfonner had to be placed Inside the rotating head. To ar^leve bet winding, an extra thin 
tape was employed on the small diameter com. 

The combined effects of the rotary head transformer, adjacent cross talk and saturation 
recordbig places limits on the minimum frequency and/or maximum flux re vc r aa l length. 
Ftirthennore. tn order to adequate^ suppress crMStalk between adjacent tradrs through 
alternate-azimuth recording, a low frequency componerit should not exist. This means that a 
DC-free modulation code which has low maxfrnum-to-mlnlinum frequency ratio is required. 

This led to the devek^xaent of the LDM modulation. Based on the MFM used generally for 
Oopfy disks. LDM-2 (Low Deviation Modulation) is free of ariy DC component arxl suppresses 
low frequencies. The mlntmal flux rev ersa l interval Is 1 T. and the maximum flux re versa l 
Interval is 2.5 T (1 T is the equivalent of a 1-blt Interval before modulatlonl. (see figure 7). 

Although the NT format utilizes a rotary-head system, the cassette Is similar to most fixed- 
head cassette formats tn that it has two sides. At the end of one side, the cassette can be turned 
over to continue record/ pl^ on the other side. A 120-minute NT cassette, therefore, provides 
60 mtmites per side. Figure 8 diagrams the location of the frvward and reverse tracks on the 
tape, dhistiatlng bow the rotary-head helical-acan system can be implemented in a bi- 
directional design. 

To a ccommo date bl-directlonal operation, the cassette Ud Is hinged symmetrically, enabling It 
to iqien upward or downward. The bl-directlonal design also dictates the inclusion of two 
pressure roSers. one at eadi forward corner of the cassette. The action of insertlrig the cassette 
auComatlcally opens the lid: the capstan is then pressed against the take-up-side pressure 
roller to Initiate tape drive. 

The Aramld base film is characterized by a high Young's modulus, and It has enabled the 
development and refinement of a manufacturing process which ensures film adhesion of the 
deposited metal layers. It also lesuits in low fhctuxi loss at each reel, enabling low-tensloa 
tape drive. Consequently, the NT tape, while only 4.8 microns In thickness. Is highly reliable 
and durable. 


NT CZRCCrniT 

While the design of the NT format in itself enables a high degree of miniaturization, the fact 
that It Is a digital audio recording system makes complex ekctrlcal circuitry inescapable. 
ThcTefote. in order to realize the design goals of extreme compactness and low power 
consumption, the NT-1 Digital Micro Recorder incorporates the latest circuit miniaturization 
technologies. 

Six new LSI chips, in particular, were developed expressly for NT format applications. Of these, 
five are CMOS devices. These six chips are the equivalent of 1 .8 million transistors. 

1. DSP LSI contains digital over-sampUng filters for the A/D and D/A converters, 
error correction and concealment code ericoder. decoder, modulator, demodulator, 
and non-tracking processing circuitry: it is used in coi^unction with an external 1 
megabit dynamic RAM chip, which provides the necessary non-tracking and servo 
buCTers. 

2. ADA LSI contains the A/D and D/A converters plus all ancillary analog and digital 
circuits. 

3. DET LSI contains digital circuits for playback RF equalization. PLL and associated 
functions. 


146 



4. DRV LSI contains a high performance DC-to-DC converter, power supply regulators, 
and motor driver drcultiy. 

5. Micro- CTL LSI contains a microprocessor that performs calculations for motor 
servo and system control and controls LCD readout 

6l R/P IC contains the RF record and playback amplifiers. 

The LSI chips in the Digital Micro Recorder arc interconnected via the SSB laimple serial bual. 
This unique architecture enables the exchange of large volumes of data between the central 
microproceaaor and the individual LS! circuits. At the same time. It reduces the number of 
required ptns on the LSI chips and the number of signal paths on the circuit board. SSB thus 
fadlltates njal time control of numerous functions while recording dreutt compkacty. As an 
added benefit, the simplified signal paths decrease the generation and Induction of transient 
noise. 


The NT Digital Micro Tape Cassette is uUra-conq>act about the size of a postage stamp. It is 
SOoan wide. 21.5mm d^. and 5mm thick, maddng Its volume approximate^ 1/4 tlwt of a 
microcassette and 1/25 that of a compact cassette. Figure 9 depicts relative cassette sizes. 
Notwithstanding these diminutive dimensions, the NT cassette provides a maximum 
moatd/pkay time of 120 minutes. 

As explained earlier, die cassette shell incorporates self-aligning tape guides and pteasure 
roller elements that are external to the cassette In conventional rotary-head recording 
systems. These components are. of course, central to the NT format's non-loadmg ^stem. The 
lid mechanisar Is a relatively simple construction that assures a perfect seal. eOectlvety 
keeping contaminants. • of the cassette. 

NT format development from tlte outset has been based the use of metal-evaporated tapca 
because the medium is in several ways Ideal for digital recording: 

1. In metal-evaporated tapes, only the acUve magnetic ctmiponent Is deposited onto 
the base fikn. This is in contrast to metal particle, oxide, or ferrite formulations, 
which require an inert binder material. Thus, the magnetic layer of a metal- 
evapciXited «ape can be extremely thin. As a result, the receding fi^ can penetrate 
the entire thickness of the magnetic layer. This makes it possible to overwrite data 
with iOO per cent erasure, thereby elkninatlng the need for a separate erase head. 

The absence of inert binder material in metal-evaporated tapes results in higher 
magnetic maierial density, and higher output levels and C/N (carrier-to-noise) 
ratio. 

2. When recording on metal-evaporated tape.the unit magnetizing length (one half the 
shortest wavelength) is greater than the thickness the magnetic layer. Under 
these conditions. self-demagneUzatlon is reduced as compared to that obtained in 
particulate media. Because of this reduced self-demagneUzatlon. a lower coercMty 
is needed than m metal particulate tape in order to sustain similar high recording 
densities. This uieans that record head pole-tip saturation is less likely to occur. 

The dual-layer metal -evaporated formulation used in the NT Digital Micro tape has been 
optimized for bi-directional record/ p’.ay. This necessitated a double metal evaporated layer. 
The direction of tilt of the columnar structure of the evaporated film is set by the Incident angle 
of the metal vapor. The highest playback output obtains when the columns tilt in the direction 
of head motion. This is illustrated in figure 10. An abrasion-free backcoatlng prevenis wear 
of the built-in plastic tape guides. A low-abrasion protective coating on the miignetlc layer 
ensures tape stability and prevents premature head wear. 


147 



The use of microprocessor control has eliminated the need for trtmpots and other similar 
devices. All control data are held in a non-volatile random access memory so that 
callbratkms and adjustments ore retained in the absence of power. Without mechanical 
trimmers, circuit boards can be smaller, and problems stemming from contact failure or drift 
are eliminated. 

All electrical ctmiponents-lncludlng the LSI chips, resistors, capacitors, and Inductors-are 
mounted on the surface of a highly flexible circuit board. Because the board can be folded, parts 
that are usually remotely located-such as the headphone and microphone jacks, switches, arxl 
LCD-can now be mounted directly on the board as well. This design not only aids 
miniaturization but also reduces the number of components that can lead to noise or 
reliability problems. 


OOMCLDSIOlf 

This report covers the introduction of the prototype ultra small NT recorder. The future should 
bring even greater recorder miniaturization and reduced power consumption concurrent with 
progress In the semiconductor industry. In fact, some expect to see volume and power 
consumption reduced to approximately 1/10 of their current state. Furthermore, the 
possibility of reducing cassettes to l/25th the size of compact cassettes is very appealing in this 
age of enviromnental concerns, which values energy-savings and Umlt^ use of natural 
resources. These concerns, which took root In the past decade, are likely to continue Into 
future generations. 


148 



Figure I Non-loading Metnod 



Cross-section of Cassette and Non-loading *='omat 
Tape Passage 

Overwrite prevention Overwrite prevention 




















Figure 4 


Data Block Structure 


Beginning 


Scan direction 


1 track 


End 


1 j Main data 92 blocks | ^ 

Inner repeated datajj 
6 blocks 


Outer repeated 
6 blocks 

Beginning y' 

ck- 

1 block 

End, 

r 



Parity 

Data 

1 J 

ho « 

^ <D 

u *d ^ 

c: N m2 

>i -H 0*0 

a> c < 


Error 

detection 

code 


152 




















detector 


Digital 

LPF 



"'r 

+ 


I 

Offset 

• 

u 

c « 
m m 
u • 

Reference 


e Vi 

• T? 
o: e 

clock pulse 


circuit 




address RF j 

Playback < Y 

circuit 

Xwrite data 


l V: n 
Read address 


Buffer 

memory 


w 

Quartz 


^Read data 
To the decoder 


0 MO 0 Cjl 1 0|0 1 1 1 

_jtij-HJ-rTvrrL 


Figure 7 Examples of LDM-2 Mchluxated Wavefoirm 


Input data 

Modulator 

output 

Input data 
(continued) 

Modulator 

output 


(IT is 1 bit length of 
input data) 


l l l O l l O O Oli n i 


1 I 


2. 5 T 


Minimal Maximal 

flux flux 

reversal reversal 

iaterval interval 






Figure 8 


Tape Pattern 


Outer repeated data 



155 



Figure 9 



CASScm sir£ comamuson (hiohtalasca only) 

OIGnAl ^MCRO S##0%w ACrUAl SfZ£. 



-•IK ' * 

L- " =- i 

|TO'i']niwiyiiii|iH|i|i|ipii<ti|i|ii<ti|i|i |inii i ifiii HiMn 

0 1 2 3 4 5 6 


( 


08 62 92 LZ 92 92 P2 £2 22 12 02 _ 

^iiiiliiiiliiiiLii!iiliiiiliiii!uiiluiilaiil;!iili!ii!iiii!uii^^ 


6 l 8 l /.I c 

illllllllllllllll 


I SL 

llllliiil! 






Figure 10 


laooA 


/ 


/ / 

/ / 


/ 


/ / 


^ , , metal vapor 

/ / . . / 


/ 




/ y 




/ 


evaporation layer 

(colunar strucursi 


Output Level Difference between 
A-8 Tape Running Directions 


Double Layer 




157 




'^ 93-80462 


IT , ^ — 

RAIDTDtokAmij 



AB8TKACT 


Rarh RAID level reflects a different design architecture. Associated with eadi Is a backdrop of 
Imposed limitations, as well as possibilities which may be eaqilolted within the archltecturai 
constraints of that leveL There are three (3) unique features that differentiate RAID 7 from all 
other levels. 

(1) RAID 7 is asynchronous with respect to usage of I/O data paths. Each I/O drive 
(indudes all data and one parity drived as w^ as each host interface (there may be 
multiple host interfaced h^ independent control and data paths. This means that 
eadi can be accessed completely. Independently, of the other. This Is facilitated by a 
separate device each* Tor each device/inteilace as weO. 

(2) RAID 7 is asynchronous with respect to device hierarchy and data bus utilization. 
Each drive and each Interface is connected to a high speed data bus controlled by the 
embedded operatliig system to make independent transfers to and from central 
cache. 

(3) RAID 7 is asynchronous with respect to the operation of an embedded real time 
process oriented operating ^~5t*m. This means that exclusive and independent of 
the host, or multiple host paths, the embedded OS manages all I/O transfers 
asynchronously across the data and piarlty drives. 


A key factor to consider is that at the RAID 7's ability to anticipate and match 
host I/O usage patterns. This yields the following benefits over RAID's built 
around mlcrD-code based architectures. 

RAID 7 appears to the host as a normally connected Big Fast Disk (BFD). 

RAID 7 appears, from the perspective of the individual disk devices, to minimize the 
total number of accesses and optimize read/write transfer requests. 

RAID 7 smoothly integrates the random demands of independent users with the 
principles of spatial and temporal locality. This optimizes small, large, and time 
sequenced I/O requests which results In users having an I/O performance which 
approaches performance to that of main memory 


Sustained Host I/O Transfer Rates 

The real issue as far as RAID I/O perfcrmance is concerned is the sustained transfer rate to the 
host. In the RAID 7 device the data drives represent the available bandwidth to store data. If 
the number of data drives were to be five (5) and those drives were capable of a sustained 
transfer rate of 200 Kbytes/sec, then RAID 7 could offer the host a 5 X .95 X 200 Kbytes/sec or 
950 Kbytes/sec sustained transfer rate. It is significant that unlike other RAID levels. RAID 7 
offers a linear increase in sustained transfer capacity as the number of drives Uicreases. 




MBQV0 


159 



stable Spindle Read/Wiltes 


One simple measurr of a RAID device ought to be how it answers the following two questions: 1} 
Can the RAID perform small reads and writes better than a single spindle?: 2) Can the RAID 
perform large reads and writes better than a single spindle? RAID 5 for example carmot match 
single spindle performance for large writes, and for some small writes can muster only 1 /20th 
of single spindle performance. RAID 7. however, exceeds single spindle performance In all 
cases. 

Most of the published '^hlte Papers’ on RAID performance compare dlfierent measures with 
different architectures, "'or example. Mbytes/sec are used to evaluate RAID 3 while I/O’s per 
second ate used to m^asn.-e RAID 5. The problem with such comparisons Is two fold: (1) they do 
not match real world systems which most always have a continuous mix of small and large 
requests. (2) they mask the performance of the untested meaf;ure. 


System Crmfigoratiotui 


Series A 

Series B 

Series C 

Rackmount 

8 Logic Slots 

14 Logic Slots 

20 Logic Slots 

8 Logic Slots 

8 Drive Slots 
3.5" Only 

16 Drive Slots 
5.25" and 3.5" 

24 Drive Slots 
5.25" and 3.5" 

9 Drive Slots 
3.5" Only 

600 Watt FT 

1200 Watt FT 

1800 Watt FT 

600 Watt FT 

3 Device I/Os 

6 Device I/Os 

12 Device I/Os 

3 Device I/Os 


. uses Industxy standard SCSI disks 
„ upgradable to 2S6 Mbytes cache 
.. mnltiple host aoalablUty 
.. completely transparent-load and go 
.. requires no special software or drivers 


160 






























N93-80463 


Tutorial: 

Performance and Reliability in Redundant Disk Arrays^ 

f ^ io 

A disk orray is a collection of phystcaUy small magnetic disks diat is packaged as a single 
unit but operates in parallel. Disk arrays capitalize on the availability of small-diameter disks 
6om a price-competitive market to provide the cost, volume, and capacity of current disk systems 
but many times their performance. Unfortunaiely. relative to current disk systems, the larger 
number of components in disk arrays leads to hi^lier rates of failure. To tolerate failures, redun- 
dant didt arrays devote a fraction of their capacity to an etKoding of their informttiott lliis 
redundaitt information enaUes the contents of a failed didt to be recovered from the contents of 
non-fafled disks. In this tutorial I will hi^ili^ the simplest and least expensive encoding for 
this redimdancy. known as parity. In addition to compensating for the higher failure rates of 
didt arrays, redundancy allows hi^y reliable secondary storage systems to be built much more 
cost-effectively than is now achieved in conventional duplicated di^. 

Did: anays that combitK redundarKy witii the paraUelism of many small-uiameter disks ate 
often called Redundant Arrays of ItKxpendve Disks (RAID). This combination promises 
improvements to both the performance and the reliability of secondary storage. For exarjple. 

Ti^le 1 compares IBM’s premier disk product, the IBM 3390. to a r^undant disk array con- 
structed of 84 IBM 0661 3V^-inch disks. The redurdant disk array has comparable or superior 
values for eadi of the metrics given in Table 1 and appears likely to cost less. 

In the first section of this tutorial I explain how disk arrays exploit the emergetKe of high- 
performance. small magnetic disks to provide cost-effective disk parallelism that combats the 
access arxl transfer gap problems. The flexibility of disk-array configurations bciKfits manufac- 
turer and consumer alike. In contrast, I describe in this tutorial’s second half how parallelism, 
achieved through increasing numbers of components, causes overall failure rates lo rise. Redun- 
dant disk arrays overcome this threat to data reliability by ensuring that data remains available 
during and after component failures. 

As far as the organization of redundant data in a disk array is concerned, it can be tieaied as 
a coding problem. The redundancy internal to a disk corrects non-catastrx^>hic failures and 
identifies catastrophic failures, whereas redundancy at the ditic-array level corrects catastrophic 
ditic failures. Codes as simple as parity, which is rx>t a single error-correcting crxle, can provide 
single-failure protection because of this interna! redundancy and its ability to identify failed 
disks. Mirroring, the traditional mechanimt fur single-erasure correction in disk subsystems, has 
high overhead cost.i that can be reduced with N+1-pariiy codes. The characteristics of these 
N-tl -parity codes depend on the organization of user data in the array. Although some self- 


^ This material describes a tutorial, wliose slides arc included, largely derived from my University of 
CaiuorTiia at Berkeley dissertation. Redundant Disk Arrays: Reliable. Parallel Secondary Storage, to be 
published by MIT Press. This research was funded by NSF grant MIP-87 15235, NASA/DARPA grant 
NAG 2-591, a Computer Measurement Group fellowship, and an IBM predocloral fellowship. 


\ \J^ 


Garth A. Gibson 
School of Computer Science 
Carnegie Mellon University 
5000 Forbes Ave., Pittsbugh PA 15213-3890 


163 



Metric 

IBM 

3390 

Redundant 
Disk Array 

Disk Units 

1 

70+7-^7 

Formatted User Data Capacity (MB) 

22,700 

22,400 

Number of Useful Actuators 

12 

77 

Avg. Access Time (msec) 

19.7 

19.8 

Max. Read VOs/sec/Box 

609 

3.889 

Max. Write J/Os/sec/Box 

609 

^972 

Max. Transfer Rate (MB/sec) 

15 

130 

Disk Power Consumption (WO 

2,900 

1.0(0 

Volume for Disks (cubic feet) 

97 

11 

Mean Tune To Data Loss (1 ,(X)0 hours) 

50-250 

6,600 

Component Disk Costs ($1,000) 

7 

67 

Customer Price ($ 1 ,000) 

156-260 

? 


Table 1: Comparison of a Strawman Redundant Disk Array to an IBM 3390. A “strawnum'' 
redundant disk array constructed with 84 IBM 0661 model 370 (3'A-inch) disks has many advan- 
tages over IBM’s top-end disk product, the IBM 3390. It has the user capacity of 70 disks: its over- 
head is 7 disks ( 10%) for redundant data and 7 disks (10%) for on-line spares. Because parity data 
is distributed among 77 of the disks and because user data is not stored on spare disks, cily 77 
disks contribute to its performance. For the maximum HO accesses per second calculation, the 
tranter unit is a single sector. For the truudmum tranter rate calculation, the transfer unit is a 
track from every disk that contains user data (77 disks). Most metrics apply to disk components 
only tmd may be degraded when controller and host ^ects are included. The IBM 3390 mean tirru 
to failure is not publicly known but can be expected to be better than IBM’s previous top-end pro- 
duct, which is reported to have had a mean time to failure 53,000 hours. To compare costs 
(based on 1990-1991 data), I show the price a disk array manirfacturer would pay for comparable 
3‘A-inch disks from Seagate and the price range that IBM’s best customers pay for a maximally 
configured IBM 3390 and haf of an IBM 3990 (disk controller). 

tuning uatabase applications prefer not to automatically stripe data, most disk an ays rely on strip- 
ing to improve performance by balancing the load across disks and enabling the parallel transier 
of large requests. Byte-interieaved striping provides increased transfer bandwidth without 
increasing access bandwidth in a manner analogous to. but more flexibly titan, the way that paral- 
lel transfer disks increase transfer but not access bandwidth. In contrast, block-interleaved strip- 
ing provides both high-transfer atid high-access bandwidth at the cost of greater software com- 
plexity. 

More complex and expensive codes can be used to provide multiple-failure correction in 
very large or very reliable disk arrays, but these will not be addressed here. 

In this tutorial’s second section, I review the performance expectations for non-redundant 
disk arrays. Disk arrays derive their performance advantages by "striping’ the data across multi- 
ple disks. The greatest benefit of striping is that it decreases transfer times for large requests. In 
addition, striping automatically distributes independent accesses to balance the workload across 
disks. Because each disk access involves substantial overhead, the unit of striping must be care- 
fully chosen to avoid a mismatch with the array’s workload. A striping unit size with wide suc- 
cess in the absence of workload knowledge is about the capacity of one track. For workloads that 
emphasize large sequential transfers, byte-interleaved striping with synchronized rotations and 
seeks offer the largest decreases in response time. However, byte-interleaved organizations have 


164 




a much lower throughput for small random accesses. A block-interleaved striping organization 
provides nearly as low response times and much higher access throughputs as do byte-interleaved 
organizations. 

Redundant data reduces some of the performance benefits of data striping, however, 
because this reduixlant data must be updated as user data is updated. In this tutorial’s third sec- 
tion I address the performance penalties associated with maintaining redundant data encodings. 
Without assistance from file system or application software, the main penalty to performance is 
as little as me and as much as *hree extra accesses that mt st be perforrred with every small, ran- 
dom access. In contrast, with a file system that groups small write accesses into large write 
accesses, an N+l-paiity redundant disk array with block-interleaved striping can provide nearly 
all of the performance of its disks as well as inexpensive, high reliability. Other, less complete 
solutions to the performance penalty associated with small random accesses include caching, 
applications hints, and floating parity organizations. With the performance expectations outlined 
in these sections and the much lower cost for redundant data, an N+1 -parity disk array with 
block-interleaved striping is the best organization for a single-erasure-correcting redundant disk 
array. 

Finally, before turning to disk array reliability, I discuss the characteristics of disk lifetimes. 
Although anecdotes of disk failure models abound, little concrete data has been widely published, 
and there is no consensus among the many vigorously pressed opinions. Yet the distribution of 
magnetic disk lifetimes is critical to the proper design of failure- tolerant disk systems. To set the 
stage for an examination of disk array reliability, 1 offer an analysis of two particular populatic.^ 
of 5V4-inch disks observed over 18 months beginrmg in 1987. These two populations, totaling 
13S0 disks, have significantly different lifetime distributions, probably derived from the greater 
maturity of the manufacturing process for the older of these two disk models. For example, 
asstuning an exponential distribution for lifetimes and ignoring failures during an initial "break- 
ing in” period, the mean lime to failure (NfTTF) of the newer product is 1 15,000 hours while the 
older pr^uct has a 368,000 hour MTTF. The appropriateness of an exponential model for disk 
lifetime distributions is important to the final section’s disk array reliability results. The data I 
present indicates that there is reasonable evidence to indicate that the lifetimes of the more 
mature of these products can be modeled by an exponential distribution with a mean lifetime of 
ever 200,000 hours. For the less mature of these products, there is evidence that an exponential 
random variable is too simplistic a model, although it cannot b>? ruled out. 

In the last section of this tutorial, I seek to facilitate the cost-effective design of reliable 
secondary storage by presenting analytic models of the reliability of redundant disk arrays. The 
models include a wide speanim of disk array designs so that individual designers will be able to 
characterize the reliability of the system they want to build. The most fundamental model con- 
siders the effect of independent, random disk failures on an array's data lifetime; it is based on a 
well-studied Markov model and yields a simple expression for reliability. Another model yields 
an analytic expression for reliability by solving separate submodels for data loss derived from 
multi jiv*-disk failure causes such as those induced by sharing interconnect, controller, cooling, 
and power-supply hardware and concurrent, independent disk-failures. Although N-t- 1 -parity pro- 
tection only insures the correction of a single disk in a parity group, disk arrays can be organized 
so that each disk in a support-hardware group is containe ' 'n a distinct parity group. In this way, 
dependent disk failures are tolerable because they affect at most one disk per parity group. These 
models have been validated against a detailed disk-array lilctime simulator for a wide variety of 
parameter selections. Agreement in most cases is within the simulator’s 95% confidence interval. 

The models 1 present in this chapter shew that a redundant disk array can easily be designed 
to provide higher reliability tlian a single disk. Moreover, with a small overhead for parity and 
spare disks, a redundant disk array can achieve very high reliability. For some configurations 
including my strawman configuration, a N-i- 1-parity disk array with on-line spare disks achieves 


165 



higher' reliability than the more expensive mirrored disk array. As more and more reliability is 
requi'ied of more and more general purpose computer systems, reliability-cost tradeoffs >viU 
become critical. The models and design implications discussed in this tutorial will enable secon- 
dar}' storage system designers to achieve reliability goals witn cost-effective redundant disk array 
solutions. 

For more information, see: 


(1) Peter M. Chen. Garth A. Gibson. Randy H. Katz. David A. Patterson. ** An Evaluation of 
Redundant Arrays of Disks Using aii Amdahl 5890.’ ’ Proceedings of the 1990 ACM 
Cofference on Measurement and Modeling of Computer Systems (SIGMETRICS), Boulder 
CO. May 1990. 

(2) Peter M. Chen. David A. Patterson. ’’Maximizing Performance in a Striped Disk Array.” 
Proceedings of the 17th Annual International Symposium of Computer Architecture 
(SIGARCH). SeatUe WA. May 1990. pp 322-331. 

(3) Ann L. Chervenak. Randy H. Katz. “Performance of a RAID Prototype.’ ’ Proceedings of 
the 1991 ACM Corference on Measurement and Modeling of Computer Systems (SIG- 
MbTRICS). May 1991. 

(4) Garth A. Gibson. Lisa Hellerstein. Richard M. Karp. Rcndy H. Katz. David A. Patterson. 
“Coding Techniques for Handling Failures in Large Disk Arrays.” Third international 
Corference on Architectural Support for Programming Languages and Operating Systems 
(ASPLOSlIl), Boston MA. April 1989. pp 123-132. 

(5) Garth A. Gibson. Redundant Disk Arrays: Reliable. Parallel Secondary Storage. RiD 
dissertation. University of California at Berkeley. UCB/CSD 91/613. 1991. To be pub- 
lished by MIT Press. 

(6) Garth A. Gibson, David A. Patterson. “Designing Disk Arrays for High Data Reliability.” 
Journal of Parallel and Distributed Computing, 'd appear, January 1993. 

(7) Jim Gray, Bob Horst, M*»rk Walker, “Parity Striping of Disc Arrays: Low-Cost Reliable 

Storage with Acceptable fhroughput,” Proceedings cf the 16th International Corference 
on Very Large Data Bases (VLDB). August 1990, 148-161. 

(8) Mark Holland, Garth A. Gibson, “Parity Declustering for Continuous Operations in Redun- 
dant Disk Arrays,” Fifth International Corference on Architectural Support for Program- 
ming Languages and Operating Systems (ASPLOS V). October 1992. 

(9) Randy H. Katz, G. A. Gibson, D. A. Patterson, “Disk System Architectures for High Per- 
formance Computing,” Proceedings of the IEEE, Volume 77 (12), December 1989, 

pp 1842-1858. 

(10) Michelle Y. Kim, “Synchronized Disk Interleaving,” IEEE Transactions on Computers, 
Volume C-35 (1 1 ). November 1986. 


166 



(1 1) Edv/aid K. Lee, Randy H. Katz, ' ‘Performance Consequences of Parity Placement in Disk 
Arrays,’ ’ Fourth International Corrference on Architecture^ Support for Programming 
Languages and Operating Systems (ASPLOS IV), Palo Alto CA, April 1991. 

(12) M. Livny, S. Khoshallan, H. Boral, "Multi-disk Management A'gorithms,” Proceedings of 
the 1987 ACM Conference on Measurement and MoeUUng of Computer Systems (SIG- 
METRICS), May 1987. 

(13) Jai Menon, Jim Kasson, "Methods for Improved Update Performance of Disk Arrays,” 
Proceedings of the Hasvaii Internatiorud Conference on System Sciences, 1992. 

(14) Richard R. Muntz, John C. S. Lui, ' ‘Performance Analysis of Disk Arrays Under Failure," 
Proceedings of the 16th International Conference on Very Large Data Bases (VLDB), 
Dennis McLeod, Ron Sacks-Davis, Hans Schek (Eds.), Morgan Kaufinann Publishers, Au- 
gust 1990, pp 162-173. 

(15) John K. Ousterhout, Fred Doughs, "Beating the VO Bottleneck: A Case for Log-Structured 
File Systems,” ACM Operating Systems Review, Volume 23 (1), January 1989, pp 1 1-28. 

(16) David A. Patterson, Garth A. Gibson, Randy H. Katz, "A Case for Redundant Arrays of 
Inexpensive Disks (RAID),” Proceedings efthe 1988 ACM Cotf^' ' '’■uigmsnt of 
Data (SIGMOD), Qiicago IL, June 1988, pp 109-1 16. 

(17) A. L. Narasimna Reddy, Prithviraj Baneijee, "Evaluation of ?/uItipIc-DiS’ ^ , ^ms," 

IEEE Transactions on Computers, December 1989. 

(18) Mendel Rosenblum, John K. Ousterhout, "The Ciesign and ImpIerLcniation of a Log- 
Structured File System,” Proceedings of the 13th ACM Symposiwn on Operating Systc n 
Principles, 1991. 


167 



NAS ‘ Goddard Conf. on Mass ^lorage Systems & Technologies 

Redundant Disk Arrays 
Performance and Keiiabiiity 

Seat 22-24, 1992 

A. ’ 

Garth . Cibscn 

School of Computci Science 
Camee^ie Mellon University 
5000 Forbes Avenue 
Pittsburgh, PA 15213-3890 


garth.gibson^ ci,.cmu.edu 
412-268-5890 
FAX: 412-681-5739 



166 



Outline 


Motivation 

Technology, performance, design leverage, cost 
Market activity 

Ncn-Redundant Disk Arr'y Performance 

Striping for concurrency or parallel transfer 
Selecting stripe unit size 

Redundant Disk Array Performance 

Taxonomy and fundamental performance 
Small write problem and s alutions 
Online reconstruction performance 

Redundant Disk Array Reliability 
Disk failures: lifetime data 
f-'dependent failures models 
Dependent failures models 
Simulation results 



170 



Motivation 


Little Di 5ics Are Better 


Driven by personal computer and laptop market 

Much larger market — lower profit margin 
more R&D amortized and required 

Inherent advantages 

lower mass to spin, to seek 
cooler operation 
higher reasonant frequencies 
^ tighter design tolerances 
shorter stroke to seek 


171 




Trends Aggravate I/O Effects 


VLSI and multiprocessing trends 
^ 50 - 100 % / year for processors 

Gordon Bell (CACM) predicts 
^ 150%/ year fc supercomputers 

but magnetic disk performance lags 

< 5 % / year access rate 

< 4 % / year data rate 


by Amdahl’s law for unequal speedups: 
processor utilization decreases 
^ I/O bottlenecks performance 



I Motivation 1 

! i 

I 

I 

Parallelism via Arrays i 


analogue to multi-micropiocessors 



Single 


Disk Array 



IBM 


50 

70 



3380 K 

3390 

IBM 0661s 


1 

(14") 

(in 

(3.5") 



1 

Capacity (GB) 

7.5 

22.7 

16 

22.4 

2X IX 

Actuators 

4 

12 

50 

70 

12X6X 

1 Peak lO/s 

200 

600 

2500 

3500 

12X6X 

1 Peak MB/s 

12 

16.8 

85 

140 

7X SX 


I 

1 


order of rragnitude gains possible 


G. Gibson - 6 i 

I 


173 



N'odvation 


Manufacturing Advantages 


product family 




high end 


CDCzacn 


conventional: 4 disk design teams 
disk array: 1 disk design team 


G. Gibson ♦ 7 


DDDDDDDDDDDDDDDD 







Motivation 


Less Expensive 
High Reliability 


parity protected shadowing or mirroring 

33% overhead 100% overhead 


I 



175 










Motivation 


Worldwide Array Market Estimates 

$ Billions 



1990 1991 1992 1993 1994 


IBM Mainframe 0 0.115 0.240 1.100 3.380 

IBM AS/400 0 0.055 0.224 0.620 1.085 

DEC VAX 0 0.002 0.070 0.120 0.780 

Other Minis 0 0.013 0.130 0.330 0.560 

Scientific 0.010 0.030 0.075 0.250 0.450 

PC/LAN server 0.015 0.227 0.410 0.655 0.980 

Umx/Net Server 0.015 0.037 0.120 0.275 0.530 

Total 0.04 0.480 1.269 3.350 7.765 

Source: Montgomery Securities, Dec 91 


G. Gibson - 9 




Motivation 


Recap 

CPU perfc.Tnance trends disk technology trends 

(redundant) disk arrays 

low cost reliability broad product family 

rapid market, product, and research growth 

G. Gibson - 10 


177 




Striping 


Basic Performance 


Many Actuators 

^ many lO/s if disk load is balanced 
^ many MB/s if transfer is in parallel 


Data Striping 


disk 0 disk 1 disk 2 disk 3 




small random access 
^ uniform disk load 

large sequential access 
^ parallel transfer 


Livny, Sigmetrics 87 


G. Gi^' ] 1 


178 






Striping 

Striped Disk Array Performance 

10 perfectly evenly loaded, synch disks 


Response Time Throughput 

at low loads at 50% utilization 



Request Size KB Request Size KB 


derived from Jim Gray, VLDB’90 
unit of striping is important’ 

G. Gibson • 12 


179 




Striping 



Striping Unit Size 


Benefit 

decreased transfer time 
(stripe unit / transfer rate) 

Penalty 

additional seek + rotate 
(average seek + rotate) 


diskO diskl 



Goal: rules of thumb 

Metric: throughput 

Experiment (Peter Chen, SigArch90) 
16 synch disk simulator 
stochastic v/orkload 


diskO diskl 



G. Gibson - 13 


180 



I 

1 


Striping 


Known Concurrency Workload 



Striping Unit = Slope x ( concurrency - 1 ) -i- 1 sector 
Slope = S X Positioning Time x Transfer Rate 
( S = 1/4 for 16 disks ) 


G. Gibson - 14 


181 



Striping 


"Zero" Workload Knowledge 


! 


Q— 

— O cxp4jcconl 

-h- 

-f 

exp lok coni 

3— 

— €) cxp4k con20 

X 

X 

expl6kcon20 

^ - 

- -V ncr; coni 

o 

0 

ncmil.5m ccnl 

5> - 

- -O nonn400kcon10 


Q 

nOTml.5mcc'n6 



Stripir.g Unit = 

2/3 X Positioning Tine x Transfer Rate 

G. Gibson - !5 


182 




Striping 


Striping Performance Recap 


disk array has many >iCtuators 


striping utilizes parallelism 


balances disk loads 


provides parallel transfer 


striping unit sensith e to workload 


workload concurrency is most important 


rules of thumb depend on simple disk specs 


G. Gibson - 16 


183 



0 ^ 


The Catch is Data Losses 


more parallelism 

more components 
more frequent failures 


70 disks, each 150 Khour MTTF 
exponential disk lifetimes 
mean time to data loss falls 
17 years to 89 days 



Redundant Arrays of 


Inexpensive Disks (RAIDs) 



Organization Taxonomy 


Patterson, Gibson, Katz, Sigmod 88 

redundancy organization ? effect on performance ? 
u ;ing simple deterministic approximations 
based on average access times 


RAID 1. Mirroring - Tandem 

replicate all data 
100 % overhead 
groups of 2 
write to both 
read from either 


R 


(Oi 


J 


Y' 


Read ^\ri?e Read 


uses all bandwidth for reads 
but only half bandwidth on writes 


RAID 

RAID 2. Bit Interleaved with Hamming Code 
Connection Machine s Data Vault 


R 






L±J 

• • • 

1 



bti 1 

bit N check 1 check C 

■ check 

I..C is Hamming code of bit 1..N 



word size N check bits C 


overhead 

8 

4 



50% 

16 

5 



31 % 

32 

6 



19% 


lower overhead 

■*" better large write bandwidth 
soft error correction on the fly 
“ only 1 lO at a time across N+C disks 
" unit of access is N times bigger 
^ small writes must preread, merge, 

then overwrite all N+C disks 

G. Gibson - 19 


186 




RAID 3. Bit Interleaved with Parity 
Maximum Strategy’s Strategy 2 


■ parity (C=l) is a single error detect code 
but disk controller identifies failed disk 
^ parity allows single error correction 

after a disk failure, bitwise test: 
parity( good disks ) = stored parity ? 
if so, lost bit is 0, else I 


+ still lower overhead 

- same small access problems 

- same reduced parallelism 

(so RAID 2 can do double error correction) 



I RAID 

I RAID 5. Block Interleaved with Rotated Parity 

( 

I Array Technology’s RAID+ 

■ remap bits to disks so logical sector 
is all on a single disk 
^ small reads require only I disk 
^ small writes require 4 lOs 

since each data bit toggled requires 
corresponding parity be toggled 
^ parallel small writes block on parity disk 
so spread parity across all disks 


i 

I 








dO 

dl 

d2 

pO 

d3 

d4 

dS 

pi 

d6 

d7 

d8 

p2 

d9 

dIO 

dll 

p3 


RAID 4. 


dO 

dl 

d2 

pO 

d4 

dS 

pi 

d3 

d& 

P2 

de 

d7 

p3 

d9 

dlO 

dll 


RAIDS. 


+ same low overhead 

almost full large access bandwidth 
full small read bandwidth 


“ small write bandwidth is half mirroring 


G, Gibson - 21 


188 



RAID 


Back of the Envelope 
Maximum Throughput 

relative to 1 6 disk non-redundant array 


80 % > 


60 % > 


40 % > 


20% > 


0 % > 



CRrWw CRtWnv 


Mirrored 

Disks 


Byte 


C R r W w 


Block 


< 100% 


< 80 % 


< 60 % 


< 40 % 


< 20 % 


Interleaved Interleaved 


C : User Capacity 
R,W : large reads, writes 
r,w : individual reads, writes 


G. Gibson - 22 1 


189 



RAID 


Amdahl Measurements 


Non-Redundant 


Block Interleave 


P Chen, Sigmetrics’90 
20 Amdahl 6380s 


Mirrored 


MB/s/disk 


Large Accesses 
1 .5 MB average 


0 10 20 30 40 50 60 70 80 90 100 
% Reads 


Non-Redundant 


Mirrored . 


MB/s/disk 


Block Interleaved 


Small Accesses 
6 KB average 


0 iO 20 30 40 50 60 70 80 90 100 
% Reads 


G, Gibsun - 23 ' 


190 



RAID 

Floating Parity Allocation 

Menon, Hawaii Syst Sci 92 

Small write problem is 4 accesses per write 
preread and ovcrw'rite of data and parity 

Dynamically reallocate parity each overwrite 
overv/ritC takes 10%-20% rev vs 100% rev 

^ preread and overwrite of parity in "1" access time 

With 1 free track per cylinder (15 tracks) of parity 
average distance to overwrite block is 1 .6 blocks 

Transaction processing: data preread hits in cache 

^ small writes take 2 accesses: equal to mirroring ! ! 


G. Gibson - 24 


191 




Log-Structured File System 


Rosenblum, SOSP 91 

Large, writeback file caches 
^ dominant traffic is writes 

Treat disk (array) as log; write only end of log 
^ no seeks during writeback of many files 

Delayed writeback 

^ Group small writes into large writes 
Small writes only when very idle 

Log wrap around requires compaction 

cost-benefit selection of region to compact 
^ Sprite implementation experience 
50% - 85% compacts on empty region 




RAID 


Parity Declustering for Reconstruction 

Muntz, VLDB 90, and Holland, CMU TR 92 


virtual RAID physical array 



reconstruction reads 100% reconstruction reads 75% 

i 

of remaining 3 disks of remaining 4 disks 

i 

I 

napping uses balanced incomplete block designs 
^ faster reconstniction and/or 

j 

faster user access during reconstniction i 

I 

I 

i 

smart (work reducing) algorithms lose 

they cause excess seeks on replacement disk 

G. Gibson - 26 I 


193 





RAID 


Recap 


rapidly improving compute speeds 


smaller but not faster disks 


striped disk arrays for performance 


increased failure rates 


muToring 
+ small lO/s 


N+1 parity = RAID 5 

+ large lO/s 
+ low cost 


floating parity and 
log-structured file systems 

O 

parity declustering 


G. Gibson - 2 ^ \ 


194 



Reliability 


Disk Lifetime Data 


collected Jan 89 through June 90 

two populations of 5.25" disks 

1) 859 disks, 350 MB 

2) 523 disks, 200 MB 

DO As, customer bum-in failures, field failures 

fit to Weibull lifetime distribution model 
if shape is 1.0, lifetime is exponential 


95% conf. int. exponential mle 

on Weibull shape MTTF-disk 


1) 0.59- 1.04 80,000 hr 

2) 0.62 1.30 338,000 hr 


I 

I 


use exponential lifetime distribution model j 

1 

I 

G. GiOson 28 I 


195 



Reliability Modeling 


Reliability 


P = XOR(A,B,C) 
C = XOR(A,B,P) 


disk failure rate: X = 1 AlTTF-disk 
disk repair rate: (i = 1/MTTR-disk 


G X 


(G-l)X, 




MTTF-disk » MTTR-disk 


MTTDL-RAID = 


MTTF-disk "" 

N G (G-1) MTTR-disk 


7 groups (N) of iQ+l (G) disks 
MTTF-disk = 150 Khr 


2 week 3 day I day 4 hr 
0.087 0.406 1.2 7.3 


MTTR-disk 

MTTDL_RAID 

(Mhr) 


196 


G. Gibson * 29 




I Reliability 


Disk Support Hardware 


Controller 



power source critical 


non-disk, non-AC 
46 Khour MTTF-string 


^ MTTDL-RAID < MTTF-disk ? 


G. Gibson - 30 , 


197 




Reliability 

Orthogonal Parity Groups 



strings have separate power, cooling, ;^abling 
string failure is one disk per parity group 

150 Khr MTTF-string & 3 day MTTR-string 


MTTR-olsk 1 hr 4 hr 3 days 

MTTDL-RAID 356 Khr 331 Khr 132 Khr 

G. Gibson - 31 


198 















Reliability 


Reliability Comparison 
Mirror Disks vs N+l+Spares Array 


I 


Mean Data Lifetime lOyr Reliability 

(1000 hours) 



Mean Strinf ‘'’.epair Time (hours) Mean String Repair Time (hours) 


mean disk and string lifetimes = 150,000 hrs 
m ;an disk recovery time = 1 hr, immediate reorder 

G. Gibson • 32 


199 


{ Reliability 

I 

Delayed Reorder and 
Partial Strings of Spares 


Mean Data Lifetime lOyr Reliability 

(1000 hours) 



Maxin urn Spare Disks Maximum Spare Disks 

I spare is effective, 12 spares are very effective 

mean disk and string lifetime = 150,000 hrs 
disk recovery time = 1 hr, disk delivery time = 72 hrs 
mean string repair time = 72 lirs 

G. Gibson - 33 


200 



Recap Reliability 


disk lifetime distribution approx exponential 
independent disk failures greatly overcome 

i} 

faster repair using spare disks 

'U- 

support hardware failures devastating 

-O' 

orthogonal RAID against string failures 

■!> 

fast repair and spares very cost effective 
RAIDS achieve high data reliability 


but, watch out for poor reconstruction coverage 



Summary 


technology pushing disk arays 
striping utilizes parallelism for performance 
redundancy required for reliability goals 
N+1 parity is cost effective RAID 
orthogonal RAID w/ spares highly reliable 

Topics not covered 

workstation/network architecture for arrays 
on-line data compression 
double+ failure correction 

G. Gibson - 35 


202 



/ 


''' 93-»0464 

Striped Tertlaiy Storage Arrays 


Ann L. Dmpean 

Computer Science Department 
University of CaUfomla 
571 E^vans H«l| 
Berkeley. CA 94720 





A // 


1 Introduction to Striping 

f ' 

I Data striping is a technique for increasing the throughput and reducing the response time of large accesses to a 
storage system [12], [7], [8], [4j. In striped magnetic or optical disk arrays, a single file is striped or interleaved 
across several disks; in a striped tape system, files are interleaved across tape cartridges. Because a striped file can 
be accessed by several disk drives or tape readers in parallel, t!ie sustained bandwidth to the file is greater than in 
nomstriped systems, where accesses to the file are restricted to a single device. 

Gibson [4] gives an excellent discussion of striping in magnetic disk arrays, much of which can be generalized 
to optical disk and magnetic tape arrays. Two methods of striping data are byte-interleaved and block-interleaved 
striping. In a byte-interleaved system, files are interleaved a byte at a time across the collection of disks or cartridges 
that make up a ‘‘stripe'*. In such a system, each device or cartridge in the stripe will be involved in eve&y access. 
This makes synchronization of the devices easy, but does not allow any parallelism among the drives or readers in 
the strips. 

In a block-interleaved system, data interleaving is done in larger increments. The size of the interleaved block 
may be chosen to optimize sustained bandwidth (as done by Chen and Patterson for disk arrays [ 1 ]) or to minimize 
response time. In a block-interleaved system, several accesses to a stripe may occur in parallel if the individual 
accesses are small enough that tf don't involve ail the disks or cartridges in the stripe. This potential parallelism 
is an advantage of block-interleaved systems over byte-interleaved systems. This advantage may be offset, however, by 
increased latency penalties; drives acting independently will become unsynchronized, and subsequent large accesses 
involving several drives or cartridges will have to wait for the completion of the slowest device. Unless the devices 
in a stripe are kept strictly synchronized, a striped system will have longer positioning latencies than a non-striped 
system. 

Failures arc more frequent in stems with many conTiponents. In large storage arrays, potential failures include 
transient media errors, media wear, head failure, other mechanical problems with the device, and breakdown of 


203 



controllers, power supplies or cables (13]* To ensure adequate reliability of storage arrays, some form or error 
correction encoding must be maintained in the array. Although it is not necessary to perform striping to include 
such redundancy information [6], it is convenient to calculate error correction codes over a strip>e. 

The choice of an error correcting code for a storage array is based on its ability to protect the data against likely 
errors and on minimizing the impact of the code on the performance and capacity of the array [ij. Performance of 
write operations is affected by the addition of ECC, since extra redundancy calculations and extra write operations 
to store the error correction information must be performed. Also, the choice of ECC wili affect performance when 
data is being reconst rncled after a disk or cartridge failure. The ECC chosen will also affect the amount of useful 
data storage on the array, since redundancy information must be stored in place of other data. 

Gibson [4] showed that for disk arrays, single bit parity provides good data reliability as long as sufficient empty 
or *^spare^ space is lef in the array for reconstructing data in the event of disk failures. We will perform a similar 
reliability analysis for tape arrays. Our intuition is that tape arrays will require more redundancy than simple parity. 
As will be discussed in detail in Section 3, magnetic tape systems face more difficult reliability challenges than disk 
drives. Media and head wear problems as well as the occurrence of errors uncorrectable by ECC make it likely that 
errors will occur more frequently in large tape systems than in disk arrays. It is likely that a more powerful error 
correcting scheme than simple parity will be needed to protect against these errors. 

In the sections that follow, we argue that applying striping to tertiary storage systems will nrovide needed 
performance and reliability benefits. Se«:tion 2 will discuss the performance benefits of striping for applications 
using large tertiary storage systems. It will introduce commonly available tape dries and libraries, and discuss their 
performance limitations, especially focusing cn the long latency of tape accesses. This section will also describe an 
event-driven tertiary storage array simulator that we are using to understand the best ways of configuring these 
storage arrays. Section 3 will discuss the reliability problems oi magnetic tape devices, and describe our plans for 
modeling the overall reliability of striped tertiary storage arrays to identify the amount of error correction required. 
Finally, Section 4 will discuss work being done by other members of the Sequoia group to address latency of accesses, 
optimizing tertiary storage arrays that perform mostly writes, and compression. 

2 Striping for Performance 

In this section, we argue that striping is needed in large tertiary storage systems because a growing number of applita- 
tio.^s require tertiary storage systems with high sustained throughput. Striped systems will provide this thre Jghput 
better than currently available devices and libraries can. Examples of applications requiring high sustained through- 
put (up to hundreds of Megabytes per second) include those that use traditional archival systems to store results 


204 



of large calculations, satellite and seismic data, and records of financial institutions. They also include applications 
using large amounts of video, and library applications that try to give a user acceptable response time on queries of 
large data sets. 

2.1 Tertiary Storage Devices 

Currently available tertiary storage devices don^t offer high sustained throughput. Figure 1 shows some of the 
magnetic tape drives and one magaeU>>optical disk currently on the market. It compares their capacity, bandwidth 
and approximate drive cost. The magnetic tape drives can be divided into helical scan recording and linear recording 
devices. Of the helical scan devices, the 4mm DAT and 8mm technologies are low cost and high capacity. However, 
they have quite low bandwidth (0.5 MBytes/sec or less). In addition, like all the magnetic tape devices, access 
time (that is, time to position the tape and read or write a particular bit on the tape) is long. Access time will 
be discussed further below. In the mid-range of cost for helical scan drives is the Metram VLDS technology. This 
uevice has good capacity (15 GBytes/cartridge) but its bandwidth (1.2 MBytes/sec) is only a small improvement 
over the inexpensive drives. The graph also shows the 19mm DD2 technology, which is very expensive, but provides 
the best capacity and bandwidth (125 GBvtes/cart ridge and 15 MBytes/sec). It should be noted that even this 
device is incapable of supporting the hi^^h bandwidth (100 MBytes/sec or more) required by many applications 
without striping. Of the linear recording technologies, the 1/4^ is inexpensive and high capacity but suffers from 
low bandwidth. The mid-priced 1/2'' IBM 3490 technology has low capacity (480 MBytes/cartridge) but moderate 
bandwidth (6 MBytes/sec). Finally, the graph shows a 5.25" magneto-optical disk that is fairly low cost. The disk 
is lower in capacity than the tape drives and transfers at a fairly low rate (1.25 MBytes/sec). However, the access 
time on the magneto-optical drive is shorter than that of the magnetic tape drives by several orders of magnitude. 

These drives offer a w*de range of performance and capacity. However, none of the drives can sustain bandwidth 
in the range of hundreds of Megabytes per second. In traditional archival systems, it is possible to get near the 
specification of sustained bandwidth for a (>articular device, since large files are written in their entirety and seldom re- 
read. In library applications, wheie accesses to the tcitiary storage system are likely to be fairly random, maintaining 
high sustained throughput is more difficult, since tapes will be switched often. As will be described in the next section, 
access times on the drives are quite long: a minute or more for ejecting an old tape, loading a new tape and positioning 
in preparation for data t.ansfer. In systems (like libraries) where random accesses occur, sustained throughput will 
be lower than t* e specified maximum for the drives. Striping will be particularly important in these systems to 
sustain reasonable throughput. 


205 



O oa 


1000 n 


C 

a 

r 

t 

r 

d ^ 00 - 

g 

e 

C 

a 

P 

a 10 • 

c 

i 

t 

y 

( 

t 

e 

s 

) 


0 + 

0 


Ternary Storage Devices 


i9mm D2. $300,000 


■VLDS. $40,000 


■Smm, $2800 


^mm, $1000 


i/4", $750 

i/2", $20,000 
^.25" MO Disk, $4000 


' I ' I — I 

1 10 100 
Bandwidth (MBytes/sec) 


Figure 1: Drive capacity and bandwidth for magnetic tape and magneto-optical disk prodncis. 


206 



Operation 

4mm DAT 

8mm Exabyte 

0.5” Metrum 

Mean load time (sec) 

16 

35.4 

28.3 

Mean eject time (sec) 

17.3 

16.5 

3.8 

pRewind startup time (sec) 

15.5 

23 

15 

Rewind rate (MB/sec) 

23.1 

42.0 

350 

Search startup time (sec) 

8 

12.5 

28 

Search rate (MB/sec) 

23.7 

.‘16.2 

115 

Read transfer rate (MB/sec) 

0.17 

0.47 

1.2 

Write transfer rate (MB/sec) 

0.17 

0.48 

1.2 


Table 1: Measurements of 4mm j 8mm and 0.5^ helical scan magnetic tape drives. 


2.2 Tape Drive Measurements 

In order to better unde’ ♦and tape devices and robots, we performed detailed measurements of their operation. 
Measurements were made for three devices: an 8mm Exabyte drive, a 4mm WangDAT drive and a 0.5” VLDS 
Met rum drive. All three are helical scan magnetic tape drives. 

Table 1 summarizes the performance measurements made on the individual tape drives The Rist two operations 
are mechanical: loading a taj>e into a drive and ejecting a tape from Jrive. These operations are quite slow; part 
cf the reason these operations for this is the fairly complex mech' ..al manipulation of the tape in a helical scan 
system. On a load operation, additional time is spent reading format information from the start of the tape. On 
each of the three devices, Ihe combination of a load and an eject operation takes at least 30 seconds. 

Rewind and forward search !>ehav;or on each of the devices can be cb^r.\c,,erized as fairly linear after an initial 
staitup time. Table 1 shows the startup time and rewind and search rate^ for each device. Measurements of search 
and rewind rates were made for tapes written entirely with 10 MByte files. 

Finally, Table 1 shows the sustained read and write rates to a user process measured for each of th^ drives. In 
each case, the read and write bandwidth obtained are close to specifications, but the drive can easily perform much 
worse than this optimum. The devices must be kepi streaming in order to achieve good baiidwid'S. Otherwise, in 
order to avoid tape and head wear, the drive will initially pul! the head away from the tape and K sen tape tension, 
and then alter a few more seconds of no activity, the drum will stop spinning. Later accesses will require spinning 
up the drum and reapplying tap>e tension. 

Table 2 shows an important access time parameter, “average seek” time, or the time to {.jarth over 1/3 of 
the volume of the tape. Such an average seek time may correspond poorly to actual workloads, but we v/il| use it as 


207 























Device 

n 

1/3 tape volume (GBytes) 

Time (sec) 

4rrm DAT 

.400 

25 

8mm EXB8500 

1.5 

54 

Metrum VLDS 

5 

70 


Table 2: Average seek i^ines for each -^mm DAT^ 8mm EXBS^OO and Melmm VLDS drives, uhere the average seek 
is defined as being ihr time to search over !/3 the volume of the tape cartridge. 

an initiai basis for comparison. 

It should be noted from the tape drive measurements presented above tl.at the access times on these devices are 
very long. A?* ac^'css requiring a tape switch will include a rewind of some portion of the old cartridge, an eject 
operation, t^o robot operations to remove one cartridge and grab the new cartridge, a load of the n^w cartridge 
and a search to pos^^ion the new cartridge for data transfer, (The timing of robot operations will be discussed in 
the next section.) Given the long mechanical delays . ..d search times for the tape devices measured here, an access 
that includes a cartridge switch can have a latency of several minutes. Tertiary storage arrays for applications that 
will perform random accessing of data must be carefuMy configured to attempt to overcome these serious latency 
problems, S-^c*ion 2.4 will discuss our efforts to understand how best to configure tertiary storage arrays. 


2.3 Automatic Handling of Cartridges 

To provide higher bandwidth and capacity than can be supplied by a single device, several companies have built 
automated library “*ems that hold tens or thousands of cartridges that can be loaded by robots mto some number 
of magnetic tape or optical disk drives. Using several drives Li a library increa5es the aggregate bandwidth; however, 
the bandwidth to or from an individuaJ cartridge does not change. If files are restricted to a single data cartridge, the 
t>andwtdth to a particular file is limited to the bandwidth supported by a single device. Striping within or between 
these automated raries removes this limitation on bandwidth to a single file by spreading accetsses to the file a*, rcss 
several devicK,s. 

Table 3 shows a classification of some of the robots availabU for handling magnetic tape cartridges and optical disk 
platters automatically. Large libraries generally contain many cartridges, several drives and one or two robot arms 
for picking and placing cartiidges. The cartridges are often arranged in a rectangular array. Other “large library’' 
coungurations include a hexagonal '“silo" with cartridges and readers along the walls, and a library consisting of 
several cylindrical columns holding tar ridges that rotate to position them. Usually t! se large libraries are quite 
expensive (SoOO.OOO or more), but they often have low* cost per MByte compared to less expensive robotic devices. 
Carousel devices are mode'*a.elv priced (around S4C,000) an M around 30 cartridges. The carousel rotates to 


208 








Typ« 

No. Cartridges 

No. Drivts 

.No. Robot Arms 

Cost 

Large Library 

10 to 1000 

several 

one or two 

high 

Carousel 

around 50 

one or two 

one (carousel) 

moderate 

Stacker 

around 10 

one 

one (magazine or arm) 

low 


Table 3: Devices for handling tertiarg storage cartridges avtomaticatlg. 


Time to grab ca^rt ridge from drive 

19.2 sec 

Time to push cartridge into drive 

21.4 sec 


Table 4: Times for robot to grab a cartridge from a drive and push a cartridge into a drive for the EXB-120 robot 
sgstem. 

position the cartridge over a drive, and a robot arra pushes the cartridge into the drive. In most cases, there are one 
or two drives per carousel. Finally, the least expercive device (S 10,000 or less) is a stacker, which holds around 10 
cartridges in a magazine and loads a single reader. The magazine may move vertically or horizontally to position 
a tape in front of the drive slot, or the starker may have a robot arm which moves across the magazine to pick a 
cartridge. 

In order to develop a model for robot access time that could be included in our performance simulations, which 
will be described in the next section, we measured an Exabyte EXB-120 robot. This robot is a simple rectangular 
array of 1 16 cartridges and four tape drives. We measured robot arm movement time from various positions in the 
array. We found that robot arm movement varied between 1 and 2 seconds in the array. Since this time is so small 
compared to the latencies of ta|>c accesses, we are modeling this as a constant value. We also measured the time to 
grc.b a cartridge from a drive as well as the time to push a cartridge into a drive. Table 4 shows these latter two 
measurements, both at'tund 20 "conds. 

2.4 Performance Simulations 

To better understand how to configure striped tertiary storage arrays, we have written an event^driven array sim- 
ulator. This simulator uses performance models for tape devices^ optical disk drives and robots that are based on 
the device and robot measurements descri’oed in the previous sections. The sim^^ator takes as input a set of param- 
eters that are apph^xl to ^en^ral performance models to simulate the behavior of particular devices. In response to 
an input workload that includes request arrival, size and position distributions, the simulator calculates the mean 
response time and queueing delay for requests and specifies the sustained bandwidth and request rates provided for 
a particular configuretion and workload. Preliminary simulation results will be discussed in my talk. 


209 

















Our performance simulations have two goab. First, w want to understand n^w best to configurt striped tertiary 
storage arrays. This analysis is geared toward identifying performance bottlenecks in a system, and trying to overcome 
them. This might, for example, lead ne discovery that adding more readers to a system would dramaticailv 
improve performance. Depending ou the applications to be run, an array might be designed for maximum sustained 
throughput or to minimize average latency, so the simulator notes both these metrics. 

The second goal of our simulations is to identify u«.sirable properties of future drives and robots. For example, 
we can use the simulator to determine what the effect on performance would be if a particular drive's mechanical 
(load or eject) or fast search operations were twice as fast, or if its sustained bandwidth were doubled. We hope to 
identify a list of desirable properties that may inliuence companies building devices a*«d robots to design components 
that are better-suited to perform well in tertiary storage arrays. 

3 Striping for Reliability 

Besides offering higher bandwidth than non-striped systems, striped systems that include redundancy also offer 
the potential for much- needed reliability improvements in large tertiary storage systems. Reliability issues for ta^ 
arrays are more complex than for disk arrays systems. Issues of particular concern for magnetic tape systems are 
uncorrectable bit error rates, tape wear and head wear. 

3.0.1 Tape Media Reliability 

The rate of raw errors (i.e., errors before any er»^r correction has been performed) Is quite high on magnetic tape 
media. Most of these errors are caused by "‘dropouts,^ in which the signal being sensed by the tape head drops 
be^'^w a readable value. Dropouts are most commonly caused by protrusions on the tape surface that temporarily 
increase the separation between the head and the tape, causing a loss in signal intensity [9]. The debris that becomes 
embedded in the tape and causes dropouts may come from loos^ pieces of substrate left on the surface when the 
tape is sliced, from the atmosphere or may accumulate from wear caused by contact between the head and media. 
Dropouts an also be caused by in homogeneities in the tape's magnetic coating. 

Because of the high raw bit error rates on magnetic tape devices, ail drives incorporate large amounts of internal 
error correction code. However, seme errors will occur that the ECC cannot correct. The rate of such errors is called 
the Uncorrectable Bit Er»“or Rate, and for current products, i.* around one uncorrectable oit error in every Terabyte 
of data read. When such an error is encountered, the entire dats block on which ECC is performed is lost. 

Uncorrectable bit errors in the range of one per Terabyte are of particular concern in multi-Tcrabyte tertiary 
storage systems, since such systems WILL contain uncorrectable errors. In addition, if the system has a sustaine<l 


210 



r«ad rate of 10 MBytes/sec, then an uncorrectable error will be encountereti every 28 hours^ on average. If data 
reliability b important in such systems, then the addition of redundancy information to the system b essential. 

Magnetic tapes that are frequently overwritten eventually wear out. In a traditional archival system, where data 
is written and probably never read again, thb b not a serious concern. However, in library application* there b no 
limit on the number of times a tape may be read. Tapes lest on average several hundred passes [1]. However, they 
wear out even sooner if a particular segment of the tape is accessed repeatedly [5]. Linearly recorded tapes do not 
suffer so quickly from tape wear-out as tapes written by helical scan methods berause the interface with ^he head b 
less abrasive, but wear b still a concern. In large tape libraries, algorithms must be developed to track the number 
of passes to a tape cartridge and replace it before wearout occurs. 

In an rnteraciive library application, wear due to stops and starts cn the tape b likely to be severe, sinfice we wiii 
be performing random accesses to the tapes. Severe wear b manifested by large portions of the magnetic binding 
material flaking away from the tape backing. Such problems make large sections of a tape unreadable. 

3.0. 2 Tape Head Wear 

Tape heads undergo considerable wear in all tape livstems. They last for a few thousand hours of actual contact 
between the head and medium. Some tape wear b necessary in order to keep the heads in optimum condition 
(10]. Tape wear helps remove from the head particles that may have been transferred there from the tape surface 
or the atmosphere, or which came from the tape coating under conditions of friction or extremely high or low 
humidity. All tape drive manufacturers recommend periodic use of a cleaning cartridge to remove debrb from the 
tape head. Eventually, the head wear becomes extreme. We are exploring algorithms for scheduling both cleaning 
and replacement of the heads to assure adequate reliability. 

3.0. 3 Modeling Tape Array Reliability 

Head failure is the main cause of tape drive failure; however, the drive may also have other mechanical or electrical 
failures. Also, as mentioned in Section 1, reliability modeling arrays must include modeling the failure rates of 
controllers, power supplies, cables, etc. 

We plan to analyze the reliability of tertiary storage arrays with the aim of determining how much error correction 
(single-bit parity or some form of Reed- Solomon coding) is necessary to ensure adequate reliability of the array. Thb 
work will make use of the RCLI reliability simulator written by Garth Gibson [4], which v'stimates the mean time to 
data loss for particular disk array configurations. We plan to modify the simulator to perform a similar aoalysb for 
tertiary storage arrays. 


211 



4 Other Research Isst^es 


There are a number of other research topics that are being pursued in the Sequoia group at U.C. Berkeley. 

A number of issues having to Ho with the best ways of '‘onf.guring tertiary stor«*ge arrays have not been touched 
upon in the previous discussion. These include the decision of the best interleave unit for laying out data, and the 
best unit of data transfer for optimiring performance for either sustained bandwidth or number of accesses performed 
per minute. An additional issue is that of allocating buffer space needed to perform synchronization. 

Joel Fine is addressing the long latency of accesses to the tape system by looking at abstracts [3]. An abstract 
is a small subset of a data set that may be abie to an*^wer queries *o the data set. Because the abstract is small, 
it can be stored on disk or retrieved fairly quickly from tape. If bs^.ract can provide a high enough *^hit rate'* 
(i.e., can satisfy a reasonable number of queries), then it is worthwhile to build the abstrac;, a piocess that can 
computationally i v. 'nsive and time-consuming. 

Carl Staelin and John Kohl are looking at applying the Log- Structured File System (LFS) ideas of Mendel 
Rosenblum's work [1 1] to tertiary storage arrays. LFS systems are write-optimized. The tape array system would 
be treated as a log. Therefore, writes would be performed sequentially. This is an attractive idea in a tape array 
system where data is seldom re-read, since it allows cartridges to be written sequentially and minimizes the number of 
time-consuming switches. Re-reading the data is less efficient in a log-structured system, since there is no guarantee 
that an entire file is written sequentially. 

Finally, we are looking at using compression in striped tertiary storage systems. Compression is appealing because 
effective bandwidth and capacity are increased when fewer bits are moved and stored. Although many magnetic tape 
drive manufacturers are now putting compression at device level, we are looking at compression in higher levels of the 
system. At a higher level, more is known about the nature of data produced by an aoplication, and a compression 
algorithm appropriate to the data can be chosen. 

5 Summary 

Striping in tertiary storage array systems is a good idea, both for performance and reliability reasons. Striping 
offers the potential for higher bandwidth to a single file than can be achieved without striping. And, the additional 
redundancy available in a striped tape system can otfset the reliability problems caused by tap>e and head wear 
rnd uncorrectable bit errors. We arc performing simulations to understand the best ways to configure tape arrays 
composed of currently available devices and robots, and to understand d'^irable properties for future devices and 
robots. We are also modeliig the reliability of tape array systems to understand how much ECC is needed to maintain 
adequate reliability, and are developing algo^-ithirs for maintaining and replacing tape heads and cartridges. 


212 



References 


[1] Bharat Bhushan. Tribology and Mechanics of Magnetic Storage Devices. Springer>VerIag, New York, 

[2] Peter M. Chea and David A. Patterson. Maximizing performance in a striped disk ar*^ay. In Proceedings 
International Sgmposinm on Computer Architecture^ May 1990. 

[3] Joel A. Fine, Thomas E. Anderson. Michael D. Dahlin, James Frew, Michael Olson, and David A. Patterson. 
Abstracts: A latency-hiding technique for high-capacity massstorage systems. Submitted to ASPLOS-V, March 
1992. 

[4] Garth Alan Gibson. Redundant Disk Arrugs: Reliable, Paraliei Secondary Storage. PhD thesis, U. C. Berkeley, 
April 1991. Technical Report No, UCB/CSD 91/613. 

[5] H. Goto, A, Asada, H. Chiba, T. Sampei, T, Noguchi, and M. Arakawa. A new concept of data/DAT system. 
IEEE TVansactior^ on Consumer Electronics, 35(3), August 1989. 

[6] Jim Gray, Bob Horst, and Mark Walker. Parity striping of disc arrays: Low-cost reliable storage with acceptable 
throughput. In Proceedings Very Large Data Bases, pages 148-161, 1990, 

[7] M. Y. Kim. Synchronized disk interleaving. IEEE Transactions on Computers, C-35:978-988, November 1986. 

[8] M. Livny, S. Khoshafian, and H. Boral. Multi-disk management algorithms. In Proceedings SIGMETRICS, 
pages 69-77, May 1987. 

[9] C. Denis Mee and Eric D. Daniel, editors. Magnetic Recording. Volume II: Computer Data Storage. McGraw- 
Hill, New York, 1988. 

[10] C. Denis Mee and Eric D. Daniel, editors. Magnetic Recording, Volume III: Video, Audio, and Instrumentation 
Recording. McGraw-Hill, New York, 1988. 

[11] Mendel Rosenblum and John K. Ousterhout. The design and implementation of a Icg-structured file system. In 
Proceedings of the iSih ACM Symposium on Operating Systems Principles, October 1991. 

[12] K, Salem and H. Garcia- Molina. Disk striping. In Proceedings IEEE Data Engineering, pages 336-342, February 
1986. 

[13] Martin Schulze, Garth Gibson, Randy H. Katz, and Dav,d A . Patterson. How reliable is a RAID? In Proceedings 
IEEE COMPCONy pages 118-123, Spring 1989. 


213 




N93-80465 


NATIONAL MEDIA LABORATORY MEDIA TESTING RESULTS ' 



Winiam Ifularic 
ICational Media Laboiatoiy 
P. O. Box 390IS 
8t. PaaL MN 8513S-3015 


Govenunent Concerns 


^ C < J 


The government faces a crisis in data storage, analysis, archive and communication. The she^i' 
quantity of data being poured into the government systems on a dally basis is overwhelming 
systems ability to capture, analyze, dissoninate and store critical information. Puture systems f) 
requirements are even more formidable: with single government platforms having data rate dL r ^ 
over 1 Gbit/sec. >Terabyte/day storage requirements, and with expected data archive lifetimrs [ 
of over 10 years, 'fhe charter of the National Media Laboratory (NMU is to focus tt’e resources 
of industry, government and academia on government needs In the evaluation, development 
and field support advanced recording systems. 


Tlie Model 


The National Lab concept was created in response to the government awareness that various 
aspects of critical systems acquisition and support were not being met by the traditional 
govemment/contractor relationships. It was recognized that the perspective and access to 
highly-leveraged resources gained from a closer relationship with a consortium of 
commercially-focased. global corporations could benefit many aspects of the govermnent 
^stem procurement and support cycle. 

NML Continuing Tasks 

A key responsibility' of the NML is to provl«.' t sustaining user support for government recording 
^rstems and archive of data. Ihis Involves field support to sites to: solve current systems and 
media problems: give assistance in defining media handling, shipping and storage 
methodologies: advice and assistance in maintaining and recovery ^ data In current archives; 
and to provide assistance in determining the direction of system upgrades. 

NML. based upon our ongoing advanced tape evaluation tasks, is also Involved in assisting 
Program Offices and defining recording media requirements necessary for reliable, next 
generation data recording and archive. NML has been responsible for raising issues re<atlng to 
reliability and performance as international industry concerns. One example of this Involves 
the archival stability of various types of magnetic pigments; as a result, manufacturers are 
finally focusing on providing archtvally stable media for critical data applications. 

Industzy/Govenunent Cooperation 

The government needs in advanced recording and storage lead commercial markets 
requirements by 3 to 5 years, both in performance and archival data storage requirements. 
Joint govemment/lndustry participation in the National Media Lab benefits the government by 
providing highly-leveraged access to the vast resources of the supporting industry and 
university laboratories to help meet current and future government recording systems 
evaluation and support. 

The domestic recordmg industry (through NML technical reviews open to domestic industry 
participation) benefits from the focus on leading-edge requirements. Thl? may assist in 
building competitiveness of the domestic recording industry in future global markets. Unless 
the US. manufacturers of advanced storage systems are provided with both a common goals, 
and a mechanism for focus and cooperation in designing a future ^stem. the US. Government 
faces the real possibility of either a) having no acceptable method of capturing the vast 
amounts of data being collected or b) relying on an offshore source. 





215 




N93-80466 


E^valuation of D-1 Tape and Cassette Characteristics: 
Moisture Content of Sony and Ampex D-1 Tapes 

When Delivered 




Gary Ashton 

National Media Laboratoiy 
P. O. Box 33015 
St. Paul. MN 55133-3015 






Comiuerclal D-1 cassette tapes and their associated recorders were designed to operate in 
broadcast studios and record in accordance with the International Radio Consultative 
Conunlttee (CCIR) 607 digital video standards. The D-1 recorder resulted in the Society of 
Motion Picture and Television Engineers (SMPTE) standards 224 to 228 and is the first distal 
video recorder to be standardized for the broadcast industry. The D-1 cassette and associated 
media are currently marketed for broadcast use. The recorder was redesigned for data 
applications and is in the early stages of being evaluated. The digital data formats used are 
specified in MIL-STD-2179 and the American National Standards Institute (ANSI) X3. 175- ISO 
standard. 

In early 1990. the National Media laboratory (NMU was asked to study the effects of time, 
temperature, and relative huniidity on commercial D-1 cassettes. The environmental ran^ to 
be studied was the one selected for the Advanced Tactical Air Reconnaissance System (ATARS) 
program. Several discussions between NML persoimel. ATARS representatives, recorder 
contractors, and other interested parties were held to decide upon the experimental plan to be 
implemented. Review meetings were held periodically during the course of the experlmenl. 

The experiments were designed to determine the dimensional stability of the media and 
cassette since this is one of the major limiting factors of helical recorders when the media or 
recorders are subjected to non-broadcast irg environments. Measurements were also made to 
characterize each sample of cassettes to give preliminary information on which purchase 
specifications could be developed. 

The actual tests performed on the cassettes and media before and after aging fall into the 
general categories listed on the following page. 


Tests Before Aginjg: 

• Bulk magnetics 

• Surface roughness 

• Mechanical properties 

• Surface electrical resistivity 

• Thlclaiess of the overall tape and each coating 

• Tape stllTness 

• Tape shrinkage at elevated temperatures for various times 

• Quality of tape edges (width, width variation, and weave values) 

• Test of commercial and custom packaging 

• Magnetic print-through 



217 



Tests Before and After Agln^ 

• static and dynamic friction 

• Cassette operation and dimensions 

• D-1 recorder: 

Signal (RF) output at 40 Mhz 
Noise output at 39 Mhz 
Bit and burst error rates 
Tape tensions 


Tests Onl^ After Agln^ 

• D-l recorder signal (RF) and noise output after 10 cycles 


Test Reports and Data on Diskettes Available: 

Reports were written detailing the technical background, methods, equipment used, and results 
of experiments performed by the National Media Laboratory to e\^uate commercial D-l 
cassettes for use in a wide range of temperatures and humidities. The variables evaluated 
include manufacturer's lot. time, temperature, and humidity. Cassettes from two 
manufacturers. Sony and Amnex. were evaluated. These reports are listed below and are 
available by contacting: 

National Media Laboratory 

P.O.Box 330 15 

Saint Paul. MN 55133-3015 

Phone: (612) 736-6183 

Fax: (612) 736-4430 


Test Reports: 

Ashton. Gary R May 1992. Coatinq and Substrate Thickness of Sony and Ampex D-l Tape. 
NML Test Report TR-0013. 

Ashton. Gary R May 1992. Friction Characteristics of Ampex and Sony D-l Tapes. NML Test 
Report TR-0009. 

Ashton. Gary R February 1992. Initial Evaluation of D-l Tape and Cassette Characteristics. 
NML Technical Report RE-0003. 

Ashton. Gary R May 1992. M-H Meter Tests on Sony and Ampex D-l Tape. NML Test Report 
TR-0011. 

Ashton. Gary R May 1992. Magnetic Print-Through Effects in Sony and Ampex D-l Tapes. 
NML Test Report TR-0015. 

Ashton. Gary R May 1992. Modulus (Stress-Strain Curves) of Sony and Ampex D-l Tape. NML 
Test Report TR-0006. 

Ashton. Gary R May 1992. Packaging Plan for D-l Cassettes. NML Test Report TR-OOOl. 


218 



Ashton. Gaiy R May 1992. Packaging Tests of Commercial D-1 Cassettes and Cases. NMLTest 
Report TR-0002. 

Ashton. Gary R May 1992. Relative Humidity of Sor^y and Ampex D-1 Tapes when Delivered. 
NML Test Report TR-0004. 

Ashton. Gary R May 1992. ReststUMty Characteristics of Any)ex and Sony D-1 Tape. NML Test 
Report TR-0005. 

Ashton. Gary R May 1992. Shrinkage of Sony and Ampex D-1 Tapes. NML Test Report TR- 
0008. 

Ashton. Gary R May 1992. Stiffness of Sony andAmpexD-1 TVzpe.. NMLTest Report TR-00 14. 

Ashton. Gary R May 1992. Surface Roughness of Sony and Ampex D-1 Tapes. NMLTest Report 
TR-0012. 

Ashton. Gary R May 1992. Thernal ani Hygroscopic Time Constants of Sony and Ampex D-1 
Tape Cassettes. NML Test Report TR-0016. 

Ashton. Cary R May 1992. Vibrating Sample Magnetometer (VSM) Tests on Sony and Ampex 
D-1 Tape. NML Test Report TR-0010. 

Ashton, Gary R May 1992. V/idih and Weave Characteristics of Sony and Ampex 
D-J Tape. NML Test Report TR-0007. 


Data Availahle on Diskettes: 

Commercial D-1 Cassettes and Media Test Data: 1990-1991 Data. 

CcHimerclal D-1 Cassettes, Media, and Packaging Test Data; 1991-1992 Data. 


As an example of the reports generated, the rejxjrt. Relative Humidity of Sony and Ampex D-1 
Tapes when Delivered, has been Included. The technique used to determine the relative 
humidity or moisture content of the cassettes as received from the manufacturer was developed 
by NML specifically for the ATARS program needs. The technique outlined In the attached 
report example is appUcable to the problem of deteimlning the moisture content of any flexible 
magnetic media for Incoming inspection and quality control of archive conditions. The 
technique also clearly indicates the amount of time needed for the media to respond to changes 
in the relative humidity of the storage environment. 


219 



Introduction 


1 Purpose of the Test 

The purpose of this test was to determine the »Tiolsture content of the tapes when 
delivered from the manufacturer. This Is accomplished by determlnlirg the 
•rtiuUlbrlum relative humidity (RH) of the tapes at delivery. Th» equilibrium RH Is 
the RH at which the tapes remain constant in weight over time. This is also an 
Indicator of the RH of the environment In which the tapes were packaged. 

2 In Tested 


This test examined D- 1 digital video tapes from tw'^ manufacturers: 


jatg 

Model 

l^Lot# 

Test Lot 

Ampex 

219-M034 

11571 

X 


(medium) 

11961 

Y 



12201 

Z 


219-L076 (large) 

88134/88135 

j 

Sony 

DlM-34 

NA32112A 

T 


(medium) 

NA22113A 

U 



NA40114A 

V 


DlL-76 (large) 

NA92113A 

L 


In all cas^s. the cassettes were production items with unknown manufacturing 
dates. 

1 3 Test Requirements 

This test is required to understand the moisture content of the cassettes and 
magnetic tapes as they are received from the supplier. This Information is 
important in determlulng the conditioning that Is required before the cassettes can 
be used in a recorder, since there Is a humidity or moisture content range for 
recorder operation. Tapes received with the moisture content required for operation 
can be used with little or no precoiioitlonlng. Tapes out of the required moisture 
content range may require conditioning In a controlled environment before use. 

2 Summary 

As shown in the following tables, all eight lots of tape tested were packaged at a 
moisture level corresponding to 45 to 55% relative humidity at 72°F. The variation 
between lots was smaller In the Ampex tapes than In the Sony tapes. 


220 







Ampex D-1 Delivered RH ^%) 


Slae 

Lot 


y*Tenige 

6td.Dey. 

Large 

J 

R'>.S0 

i.e 

Medium 

X 1 

52.63 

0.6 

Medium 

Y 

55.14 

1.4 

Medium 

z 

1 ul.lO 

1.3 


Sony D-1 DeUvered RH (%) 


Stae 

1 

I t 

Relative Humidity (%) 

Average 

StrLDev. 

Large 

L 

54.05 

1.8 

Medium 

T 

45.90 

2.7 

Medium 

U 

45.48 

4.0 

Medium 

V 

52.6 

1.0 


3 References 

For background Information on the effect of relative humidity In magnetic t:ipes. 
see: 

Cuddlhy. Eklward F. 1976. Hygroscopic properties of magnetic recording tape. IEEE 
Transactions on Moi^netics 12:2 (March) 126-35. 

Test records are maintained by the 3M Records Storage Department. 3M Center 
Buildings 223 and 224. 

Report 

4.1 Test Equipment 

Tlie following equipment was used in this test: 

Mettler Precision Balance, model PM1200. .serial number K8412. callbrate.i i/3/91. 

Mettler Precision Balance, model PM4000. serial number K40517. calibrated 
10/25/91 (denoted HOP4000 in Ehdilbit 1). 

Mettler Precision Balance, model PM4000. oerlal number L58924. calibrated 
4/3/91. 

Mettler Precision Balance, model PM6100. serial number L03873. calibrated 
4/3/91. 


221 



















Envlronnientally-controlled rooms In Building 235 at 3M Center used were: 

72'F (22‘*C). 80% RH Room3B-355 
720F (22<>C). 50% RH Room3C-346 
72®F (22'X:). 20% RH Room 3B-359 

4J2 Test Facility Installation and Set-iq» 

Three equivalent environmental rooms were used, each containing a Mettler 
Precision Balance. Ekjulvalent means the same rate of air flow and ‘ same 
temperature. ±2®C. The relative humidity of the rooms was different. 

4.3 Test Procedures 

1. Use two temperature and humidity-controlled rooms at the same temperature 
(20 to 25°C) and at two different relative humidities. Hi and H2- ^ using 
rooms with a difference of about 60% RH. there v/lli be enough mass change to 
measure with good certainty. 

2. Measure and record the Initial wei^ i of each of the samples. 

3. Place half of the samples in the H i chamber and half In the H 2 chamber. 

4. Measure and record the weight of each of the samples for at least 7 days: 
preferably longer. 

4.4 Test Results and Anatysis 
4.4.1 Recorded Data 

At first two cassettes were placed In each of the three enviroranentally-contmllcd 
rooms. The Sony T and U lots were measured using these three environments. As 
experience was gained with the technique and It was realized that the cassettes were 
delivered at close to 50% RH moisture content, the 72®F. 50% RH test condition was 
eliminated from the test procedure. Tire 72®F. 50% RH test condition data was 
analyzed in exactly the same way as the 72*F. 20% RH and 72‘F. 80% RH data as 
shown below. 

In general, before measurements, the room temperature and relative humidity of 
the thrfe environrr:ntally -controlled rooms were measured. The RH measurements 
were generally lower than the nominal 20%. 509b. and 80% values. In calculations, 
the nominal vali ;s were used. 

Exhibit 1 Is the actual recorded data collected during this lest. 


222 



4.4 Test Results 


Relative humidity as delivered was calculate^ as follows. Given the following 
definition.s; 

Wq Original weight of sample 

Final weight of sample in chamber 1 
W2 Final weight of sample In chamber 2 
Ho Original value of humidity 

Hi Relative hurmdity in chamber 1 

H2 Relative humidity in chara'oer 2 

And assuming the weights change proportional to Hi and H2. the foilowln;| ratio 
holds: 


V'C-Wi Ho-Hi 

Wo-W2~Ho-H2 

Solving for Hq results In the following: 

Hi(W2 -Wo)+H2(Wi -W ol 

^ = W2 - Wi 

Given the following definitions: 
ho(t) Calculated '/alue of humidity at some time 

wi(t) Weight of sample In chamber 1 at some time before the final time 
W2(C Weight of sample in chamber 2 at some time before the iuiai time 

And assuming the time constant for weignt change is the same for both chambers 
(same temperature, elc.l. then wi(t) and W2(U can be substituted for Wi and W 2 in the 
above formula for Hq to result in: 

Hi (W2(t) - Wo) + H2 (wi(t) - Wo) 

= W2(t) - wim 

The v/cights of all samples In the environment at Hi were averaged to obtain a value 
of wi(t). Similarly, the weights of all samples in the environment at H 2 were 
averaged to obtain a value of w2(t). Values reported for each lot are averages over all 
ho(t) calculated. 

The following eight graphs show the change in weight of the cassettes plotted as a 
function o. time. The first two graphs with lot T and U data were measured 
differently from the other six lots. Lots T and U cassettes were placed in 80. 50. and 
20% RH environments while the other lots were placed in only 80 and 20% RH 
environemnts. Lot T tapes in the 80% RH room were swapped with tapes In the 20% 
RH room after 25 days. A similar swap was performed with the lot U tapes at 18 days 

tl.TlC. 


223 



RH Data for Sony Lot T 



Time (Days) 


- A Weight (429; 
° A Weight (430; 

* A Weight (431,’ 
« A Weight {4321 

* A Weight (433) 
A A Weight (434; 


RH Data for Sony Lot U 



• A Weight (445; 
° A Weight (446; 

• A Weight (447; 
^ A Weight (448; 

• A Weight (449; 
A A Weight (450; 


RH Data for Sony Lot V 



■ A Weight (576; 
° A Weight (577; 

• A Weight (578; 
^ A Weight (579; 

* A Weight (580, 
A A Weight (581 ; 


224 





RH Data for Sony Lot L 



■ A Weight (617; 
D A Weight (618; 


RH Data for Ampex Lot X 


E 1.5+ 


5" ^ A 4 

S 0.5-i 


2 

« -0.5< 

•f 


a -0. 

$ 


20 


40 


-1 

< . 2-I 


Time (Days) 


• A Weight (582; 
O A Weight (583; 

• A Weight (584; 
o A Weight (585; 

• A Weight (566; 
A A Weight (587; 


RH Data for Ampex Lot Y 



■ A Weight (554; 
o A Weight (555; 

• A Weight (556; 
A A Weight (595; 

* A Weight (614; 
A A Weight (621 ; 


225 






RH Data for Ampex Lot Z 



RH Data for Ampex Lot J 



Time (Days) 


5 Conclxiaions 

All eight lots of tape tested were packaged at a moisture level correspondliig to 45 to 
55% relative humidity at 72°F 

6 Recommendationa 

If the range of cassette moisture is outside of the acceptable 45 to 55% relative 
humidity range at 72°F. the tapes must be conditioned before use. 

7 Appendix 

Elxhlblt 1 Is attached. 


226 




Exhibit 1 Recorded Data 


— 

DttU 

— 

Time 

Tepe • 


Lot 

T«ap 

cn 

RH 

(%) 


Seal* 

7/24/91 1 

1:30 PM 

619 

Aropex 

J 

71 

17 

1345.76 

HOP4000 

7/26/91 

3:30 H4 

619 

Ampex 

J 

71 

18 

1344.47 

HOPAOOO 

8/9/91 

4:30 m 

619 

Ampex 

J 

71 

16 

1343.25 

HOP4000 

8/29/91 

5:30 F14 

619 

Ampex 

J 

65 

16 

1343.11 

HOP4000 

9/4/91 

9*^AM 

619 

Ampex 

J 

72 

las 

1343.33 

FU4M0 

7/24/91 

1:30 m 

620 

Ampex 

J 

74 

77 

1351 

HOP4000 

7/26/91 

3:30 m 

620 

Ampex 

J 

74 

77 

1352.13 

HOP4000 

8/9/91 

4:30 m 

620 

Ampex 

J 

75 

77 . 

1352.99 

HOP4000 

8/29/91 

5:30 R4 

620 

Ampex 

J 

71 

62 

1352.96 

HOP4000 

9/4/91 

9:30 AM 

620 

Ampex 

J 

74 

Tas 

1353,37 

PM6100 

9/5/91 

11:00 AM 

620 

Ampex 

J 

74 

78 

1353.39 

PM6100 

6/7/91 

5.00 FM 

617 

Sony 

L 

72 

20 

1380.515 


6/10/91 

400 PM 

617 

Sony 

L 

72 

20 

1378.97 


6/12/91 

400 FM 

617 

Sony 

L 

72 

20 

1378.83 


6/14/91 

4:45 PM 

617 

Son> 

L 

72 

20 

1378.52 


6/17/91 

300 PM 

617 

Sony 

L 

72 

20 

1378.32 


6/21/91 

300 PM 

617 

Sony 

L 

72 

20 

1378.28 


6/25/91 

3:30 FM 

617 

Sony 

L 

72 

20 

1378.19 


7/1/91 

3:15 PM 

617 

Sony 

L 

72 

20 

1378.12 


7/12/91 

200 PM 

617 

Sony 

L 

72 

20 

1378.08 


6/7/91 

500 PM 

618 

Sony 

L 

72 

80 

1369.64 


6/10/91 

400 ™ 

618 

Sony 

L 

72 

80 

1370.77 


6/12/91 

400 PM 

618 

Sony 

L 

72 

80 

1370.98 


6/14/91 

4:45 PM 

61S 

Sony 

L 

72 

80 

1371.14 


6/17/91 

300 PM 

618 

Sony 

L 

72 

60 

1370.95 


6/21/91 

300 R4 

618 

Sony 

L 

72 

80 

1371.43 


6/25/91 

3:30 PM 

618 

Sony 

L 

72 

80 

1371.51 


7/1/91 

3:15 PM 

618 

Sony 

L 

72 

80 

1371.61 


7/12/91 

200 PM 

618 

Sony 

L 

72 

80 

1371.87 


5/13/91 

2:30 PM 

429 

Sony 

T 



669.475 

PM4000 

5/14/91 

3:30 PM 

429 

Sony 

T 

71.5 

18 

66a96 

PM4000 

5/15/91 

300 PM 

429 

Sony 

T 

71.5 

18 

668.81 

PM4000 

5/16/91 

400 PM 

429 

Sony 

T 

71 

17 

66a74 

PM4000 

5/17/91 

300 PM 

429 

Sony 

T 

71 

19 

668.7 

PM4000 

5/20/91 

aOOAM 

429 

1 Sony 

T 

71 

I 

668.57 

PM4000 

5/22/91 

4:15 PM 

429 

1 Sony 

T 

71 

1 18 

668.552373 

PM6100 

5/24/91 

400 PM 

429 

Sony 

T 

71 

19 

668.537367 

PM6100 

5/28/91 

llOOAM 

429 

Sony 

T 

71 

! 20 

668.527362 

PM6100 

5/31/91 

3:30 PM 

429 

Sony 

T 

71 

i 

668.26 

PM4000 

6/7/91 

4:45 PM 

429 

Sony 

T 


1 

668.47 

PM4000 

5/13/91 

2:30 H4 

430 

Sony 

T 



668.75 

PM4000 

5/14/91 

3:30 PM 

430 

Sony 

1 T 

7L5 

18 

668.25 

PM4000 

5/15/91 

1 300 PM 

430 

Sony 

1 T 

71.5 

18 

668.09 

1 PM4000 

5/16/91 

i 400 PM 

430 

Sony 

T 

71 

17 

668.02 

PM4000 

5/17/91 

3:00 PM 

43;^ 

Sony 

1 T 

71 

19 

667.97 

PM4000 

5/20/91 

8:00 AM 

430 

Sony 

T 

71 

18 

667.84 

j PM4000 

5/22/91 

1 4:15 PM 

430 

Sony 

! T 

I 71 

18 

667.837055 

1 PM6100 

5/28/91 

11:00AM 

430 

Sony 

T 

71 

20 

667.807042 

PM6100 


227 






Exhibit 1 Recorded Data icontinued) 


Date 

Time 

Tape « 

MSI 

Lot 

T«mp 

(•F) 

Rll 

(%) 

Wei(ht (g] 

— 

Scale 



430 


T 

71 

20 

667.817046 

PM6100 

6/7/91 

4:45 PM 

430 

Sony 

T 

71 

20 

668.077162 

PM6100 

5/13/91 

2:30 PM 

431 

Sony 

T 



670.66 

PM4000 

5/14/91 

3:30 PM 

431 

Sony 

T 



670.79 

PM4000 

5/15/91 

3:00 PM 

431 

Sony 

T 

70 

49 

671.02 

PM4000 

5/16/91 

4KX)PM 

431 

Sony 

T 

70 

4a5 

671.04 

PM4000 

5/17/91 

3:00 PM 

431 

Sony 

T 

70 

4a5 

671 03 

PM4000 

5/20/91 

8:00 AM 

431 

Sony 

T 

70 

48 

670.99 

PM4000 

5/22/91 

4:15 PW 

431 

Sony 

T 

70 

48 

671.053486 

PM6100 

5/24/91 


431 

Sony 

T 

70 

48 

671.058488 

PM6100 

5/28/91 

11HX)AM 

^31 


e 

70 

48 

671.028475 

PM6100 

5/31/91 

3:30 PW 

431 

Sony 

T 

70 

48 

671.073495 

PM6100 

6/7/91 

4:45 PM 

431 

Sony 

T 



671.1 

PM4000 

5/13/91 

2:30 PM 

432 

Sony 

T 



668.67 

PM4000 

5/14/91 

3:30 PM 

432 

Sony 




668.79 

PM4000 

5/15/91 

3:00 PM 

432 

Sony 

A 

70 

49 

66aS3 

PM4000 

5/16/91 

4:00 PM 

432 

Sony 

T 

70 

4a5 

668.85 

PM4000 

5/17/91 

?00PM 

432 

Sony 

T 

70 

4a5 

668.85 

PM4000 

5/20/91 

8:00 AM 

432 

Sony 

T 

70 

48 

66a8 

PM4000 

5/22/91 

4:15 PM 

432 

Sony 

T 

70 

48 

668,867514 

PM6100 

5/24/91 

4:00 PM 

432 

Sony 

T 

70 

48 

668.862511 

PM6100 

5/28/91 

11:00 AM 

432 

Sony 

T 

70 

48 

668.827496 

PM6100 

5/31/91 

3:30 PM 

432 

Sony 

T 

70 

48 

668.88252 

PM6100 

6/7/91 

4:45 PM 

432 

Sony 

T 



668.9 

PM4000 

5/13/91 

2:30 PM 

433 

Sony 

T 



672.71 

PM4000 

5/14/91 

3:30 ™ 

433 

Sony 

T 



673.18 

PM4000 

5/15/91 

3:00 PM 

433 

Sony 

T 

75 

72 

673.39 

PM4000 

5/16/91 

4:00 PM 

433 

Sony 

T 

75 

72 

673.53 

PM4000 

5/17/91 

3:001^ 

433 

Sony 

T 

74 

73 

673.61 

PM4000 

5/20/91 

8:00 AM 

433 

Sony 

T 

74 

74 

673.75 

PM4000 

5/22/91 

4:15 PM 

433 

Sony 

: T 

75 

73 

673.80471 

PM6100 

5/24/91 

4:00 PM 

433 

Sony 

i 'T 

75 

73 

673.824719 

PM6100 

5/28/91 

11:00 AM 

433 

Sony 

i T 

75 

72 

673.829721 

PM6100 

5/31/91 

3:30 PM 

433 

Sony 

T 

75 

73 

673.919761 

PM6100 

6/7/91 

4:45 PM 

433 

Sony 

i T 



674.04 

PM4000 

5/13,/91 

2:30 PM 

434 

Sony 

1 T 



670.69 

, i'M4000 

5/14/91 

3:30 PM 

434 

Sony 

1 T* 



671.13 

PM4000 

5/15/91 

3:00 PM 

434 

Sony 

T 

75 

72 

671.32 

PM4000 

5/16/91 

1 4:001^ 

1 434 

Sony 

1 T 

75 

i 72 

i 671.435 

PM4000 

5/17/91 

3:00 PM 

434 

Sony 

T 

74 

73 

671.52 

PM4000 

5/20/91 

1 8:00 AM 

434 

Sony 

T 

74 

1 74 

! 671.66 

PM4000 

5/22/91 

4:15 PM 

i 434 

Sony 

i 

75 

! 73 

671.718782 

PM6100 

5/24/91 

4:00 PM 

1 434 

Sony 

T 

75 

73 

671.7588 

PM6100 

5/28/91 

11:00AM 

434 

Sony 

i ^ 

75 

1 72 

671.778809 

PM6100 

5/31/91 

3:30 PM 

434 

Sony 

T 

75 

73 

671.843838 

PM6100 

6/7/91 

4:45 PM 

434 

Sony 

T 



671.98 

PM4000 

6/7/91 

4:00 PM 

429 

Sony swap 

T 

i 


668.496 


6/10/91 

5:00 PM 

429 

Sony swap 

T 

75 

73 

669.995 


6/11/91 

2:45 PM 

429 

Sony swap 

T 

74 

73 

670.093 



228 










Bitlblt 1 Recorded Data (continued) 


DaU 


IHI 

■zm 


Lot 

Temp 

CF) 

RH 

(%) 

Weight ig) 

Scale 

6/12/91 

3:15 PM 

429 

Sony swap 

T 

75 

73 

670.232 


6/14/91 

5:00 PM 

429 

Sony swap 

T 

75 

74 

670.393 


6/17/91 

2:30 PM 

429 

Sony swap 

T 

75 

74 

670.527 


6/19/91 

5:15PM 

429 

Sony swap 

T 

75 

74 

670.606 


6/21/91 

2-30PM 

429 

Sony swap 

T 

75 

75 

670.638 


6/25/91 

3:00 PM 

429 

Sony swap 

T 

75 

75 

670.702 


7/1/91 

2:45 PM 

429 

Sony swap 

T 

75 

75 

6'/0.785 


7/12/91 

1:30 PM 

429 

Sony swap 

T 

75 

77 

670.868 


7/26/91 

2:00 PM 

429 

Sony swap 

T 

74 

77 

670.938 


6/7/91 

4:00 PM 

430 

Sony swap 

T 



667,766 


6/10/91 

5:00 PM 

430 

Son> swap 

T 

75 

73 

669.228 


6/11/91 

2:45 PM 

430 

Sony swap 

T 

74 

73 

669.378 


6/12/91 

3:15 PM 

430 

Sony swap 

T 

75 

73 

669.5 


6/13/91 

2:45 PM 

430 

Sony swap 

T 

75 

73 

669.592 


6/14/91 

5:00 PM 

430 

Sony swap 

T 

75 

74 

669.662 


6/17/91 

2:30 PM 

430 

Sony swap 

T 

75 

74 

669.806 


6/19/91 

5:15 PM 

430 

Sony swap 

T 

75 

74 

669.871 


6/21/91 

2:30 PM 

430 

Sony swap 

T 

75 

75 

669.915 


6/25/91 

3:00 PM 

430 

Sony swap 

T 

75 


669.96 


7/1/91 

2:45 PM 

430 

Sony swap 

T 

75 

75 

670.048 


7/12/91 

1:30 PM 

430 

Sony swap 

T 

75 

77 

670.134 


7/26/91 

2:00 PM 

430 

Seny swap 

T 

74 

77 

370.201 


6/7/91 

4:00 PM 

431 

Sony swap 

T 



671.107 


6/10/91 

5:00 m 

431 

Sony swap 

T 

75 

73 

671.745 


6/11/91 

2:45 PM 

431 

Sony swap 

T 

74 

73 

671.813 


6/12/91 

3:15 PM 

431 

Sony swap 

T 

75 

73 

671.871 


6/13/91 

2:45 PM 

431 

Sony swap 

T 

75 

73 

671.917 


6/14/91 

5:00 PM 

431 

Sony swap 

T 

1 75 

74 

671,959 


6/37/91 

230 PM 

431 

Sony swap 

T 

75 

74 

672.037 


6/19/91 

5:15 PM 

431 

Sony swap 

T 

75 

74 

672,087 


6/21/91 

j 230 FM 

431 

Sony swap 

T 

75 

75 

672.104 


6/25/91 

3:00 PM 

431 

Sony swap 

I T 

75 

! 75 

672.129 


7/1/91 

245 PM 

431 

Sony swap 

T 

75 

i 

672.21 


7/12/91 

1:30 PM 

431 

Sony swap 

i T 

75 

I 77 

672.288 


7/26/91 

200 PM 

431 

Sony swap 

T 

74 

77 

672.354 


6/7/91 

4:00 PM 

432 

Sony swap 

T 



j 668.922 

^ 1 

6/10/91 

1 5:00 PM 

432 

Sony **wap 

T 

71 

: 18 

668.019 


6/11/91 

1 245PM 

432 

Sony swap 

T 

71 

1 16 

667.942 

1 

6/12/91 

3:15 PM 

432 

Sony swap 

T 

71 

18 

667.886 


6/13/91 

1 2:45 PM 

432 

Sony swap 

T 

71 

19 

667.859 


6/14/91 

i 5:00 PM 

1 432 

Sony swap 

T 

71 

19 

667.843 


6/17/91 

1 230 m 

432 

Sony sw'ap 

T 

71 

18 

667.771 


6/19/91 

5:15PM 

432 

Sony swap 

T 

71 

18 

667.754 


6/21/91 

230 PM 

432 

Sony swap 

T 

71 

20 

667.796 


6/25/91 

3:00 PM 

432 

Sony swap 

T 

71 

20 

667.772 


7/1/ei 

245 PM 

432 

Sony swap 

T 

71 

18 

667.753 


7/12/91 

1:X PM 

432 

Sony su*ap 

T 

71 

20 

667.713 


7/26/91 

200 PM 

432 

Sony swap 

T 

71 

18 

667.654 


6/7/91 

4:00 PM 

433 

Sony swap 

T 



674.048 



229 





Exhibit 1 Recorded Data (continued) 


Dat« 

Tim# 

IRH 

Wg 

Lot 

T«mp 

RH 

(%) 


Scale 

6/10/91 

5:00 PM 

433 

Sony swap 

mm 


18 

672.462 


6/11/91 

2:45 PM 

433 

Sony swap 

T 

71 

18 

C72-32 


6/12/91 

3:15 PM 

433 

Sony swap 

T 

71 

18 

672.209 


6/13/91 

2:45 PM 

433 

Sony swap 

T 

71 

19 

672.145 


6/14/91 

5:00 PM 

433 

Sony swap 

T 

71 

19 

672.103 


6/17/91 

2:30 PM 

433 

Sony swap 

T 

71 

18 

671.99 


6/19/91 

5:15 PM 

433 

Sony swap 

T 

71 

13 

671.944 


6/21/91 

2:30 PM 

433 

Sony swap 

T 

71 

20 

671.964 


6/25/91 

aOOPM 

433 

Sony swap 

T 

71 

20 

671.939 


7/1/91 

2:45 PM 

433 

Sony swap 

T 

71 

18 

671.903 


7/12/91 

1:30 PM 

433 

Sony swap 

T 

71 

20 

671.859 


7/26/91 

2:00 PM 

433 

Sony swap 

T 

71 

18 

671.774 


6/7/91 

4:00 PM 

434 

Sony swap 

T 



671.994 


6/10/91 

5:00 PM 

434 

Sony swap 

T 

71 

18 

670.393 


6/11/91 

2:45 PM 

434 

Sony swap 

T 

71 

18 

670.25 


6/12/91 

ai5PM 

434 

Sony swap 

T 

71 

18 

670.143 


6/13/91 

2:45 PM 

434 

Sony swap 

T 

71 

19 

670.075 


6/14/91 

5:00 PM 

434 

Sony swap 

T 

71 

19 

670.041 


6/17/91 

2:30 PM 

434 

Sony swap 

T 

71 

18 

669.918 


^19/91 

5:15 PM 

434 

Sony swap 

T 

71 

18 

669.883 


6/21/91 

2:30 PM 

434 

Sony swap 

T 

71 

20 

669.922 


6/25/91 

3:00 PM 

434 

Sony swap 

T 

71 

20 

669.88 


7/1/91 

2:45 PM 

434 

Sony swap 

T 

71 

18 

669.a56 


7/12/91 

1:30 PM 

434 

Sony swap 

T 

71 

20 

869 802 


7/26/91 

2:00 PM 

434 

Sony swap 

T 

71 

18 

669.72 


5/20/91 

iai5AM 

445 

Sony 

U 

71 

18 

669.84 

PM6100 

5/21/91 

4:45 PM 

445 

Sony 

U 

71 

18 

669.357732 

FM6100 

5/22/91 

4:15 

445 

Sony 

U 

71 

18 

669.217669 

PMSIOO 

5/23/91 

6:00 PM 

445 

Sony 

u 

71 

18 

669.10762 


5/24/91 

4:00 PM 

445 

Sony 

u 

71 

19 

669.067603 


5/28/91 

11:00 AM 

445 

Sony 

u 

71 

20 

668.957554 


5/31/91 

4:00 PM 

445 

Sony 

u 

71 

19 

668.66 


5/20/91 

10:15AM 

446 

Sony 

u 

71 

18 

674.67 

PM6100 

5/21/91 

4:45 K4 

446 

Sony 

u 

71 

18 

674.129854 

PM6100 

5/22/91 

4:15PM 

446 

Sony 

u 

71 

18 

674.019805 

PM6100 

5/23/91 

6:00 PM 

446 

Sony 

u 

71 

18 

673.889748 


5/24/91 

4:00 PM 

446 

Sony 

u 

71 

19 

673.854732 

i 

5/28/91 

11:00 AM 

446 

Sony 

u 

71 

20 

673.739681 1 


5/31/91 

4:00 PM 

446 

Sony 

u 

71 

19 

673.44 


5/20/91 

10:15AM 

447 

Sony 

u 

70 

48 

676.52 • 

PM6100 

5/21/91 

4:45 PM 

447 

Sony 

u 

70 

48 

676.620962 

PM6100 

5/22/91 

4:15 PM 

447 

Sony 

u 

70 

48 

676.680989* 

PM6100 

5/23/91 

6:00 PM 

447 

Sony 

u 

70 

4a5 

67&690994 


5/24/91 

4:00 PM 

447 

Sony 

u 

70 

48 

676.695996 


5/28/91 

11:00 AM 

447 

Sonv 

u 

70 

48 

676.700998 i 


5/31/91 

4:00 PM 

447 

Sony 

u 

70 

48 

67644 


5/20/91 

iai5AM 

448 

Sony 

u 

70 

48 

675.19 

PM6100 

5/21/91 

4:45 PM 

448 

Sony 

u 

70 

48 

675.340393 

PM6100 

5/22/91 

4:15 PM 

448 

Sony 

u 

70 

48 

675.395417 

PM6100 


230 









Bidilbtt 1 Recotvled Data (continued) 


D«U 

Time 

Tape # 


Lt>i 

K3KIS1 

' Weight (gf 

1 

Scale 

5/23/91 

6:00 PM 

448 

Sony 

U 

70 

4as 

67^410424 


5/24/91 

4:00 PM 

448 

Sony 

U 

70 

48 

675.40042 


5/28/91 

11:00 AM 

448 

Sony 

u 

70 


675.420428 


5/31/91 

4:00 PM 

448 

Sony 

u 

70 

48 

675.15 


5/20/91 

10:15AM 

449 

Sony 

u 

74 

75 

671.06 

PM6100 

5/21/91 

4:45 PM 

449 

Sony 

u 

75 

74 

671.57E72 

PM6100 

5/22/91 

4:15 FM 

449 

Sony 

u 

75 

73 

671.778809 

PM6100 

5/23/91 

6:00 PM 

449 

Sony 

u 

75 

74 

671.878853 


5/21/91 

4.0CPM 

449 

Sony 

u 

75 

73 

671.958888 


5/28/91 

11:00 AM 

449 

Sony 

u 

75 

72 

672.098951 


5/31/91 

4:00 PM 

449 

Sony 

u 

75 

73 

671.9 


5/20/91 

10:15AM 

450 

Sony 

u 

74 

75 

675.07 

PM6100 

5/21/91 

4:45 PM 

450 

Sony 

u 

75 

74 

675.640526 

PM6100 

5, *22/91 

4:15 PM 

450 

Sony 

u 

75 

73 

675.830611 

PM6100 

5/23/91 

6:00 PM 

450 

Sony 

u 

75 

74 

675.925653 


5/24/91 

4:00 PM 

450 

Sony 

u 

75 

73 

67a000687 


5/28/91 

11:00 AM 

450 

Sony 

u 

75 

72 

67a 140749 


5/31/91 

4:00 PM 

450 

Sony 

u 

75 

73 

675.95 


6/7/91 

4:00 PM 

445 

Sony swap 

u 



668.896 


6/10/91 

5:00 PM 

445 

Sony swap 

u 

75 

73 

670.495 


6/11/91 

2:45 PM 

445 

Sony swap 

u 

74 

73 

670.648 


6/12/91 

3:15PM 

445 

Sony swap 

u 

75 

73 

670.746 


6/13/91 

2:45 PM 

445 

Sony swap 

u 

75 

73 

670.84 


6/14/91 1 

5:00 TO 

445 

Sony swap 

u 

75 

73 

670.902 


6/17/91 

2:30 TO 

445 

Sony swap 

u 

75 

74 

671.051 


6/19/91 

5:l5TO 

445 

Sony swap 

u 

75 

74 

671.093 


6/21/91 

2:30 PM 

445 

Sony swap 

u 

75 

75 

671.131 


8/25./91 

0:00 TO 

445 

Sony swap 

u 

75 

75 

671.168 


7/1/91 

2:45 TO 

445 

Sony swap 

u 

75 

75 

671.231 


7/12/91 

1:30 TO 

445 

Sony swap 

u 

75 

77 

671.307 


7/26/91 

2:00 TO 

i 4-^5 

Sony swap 

u 

74 

77 

671.639 

1 

6/7/9. 

4:00 PM 

446 

Sony swap 

u 



673.668 


6/10/91 

5:00 PM 

446 

Sony swap 

u 

75 

73 

675.206 

i 

6/11/91 

2:45 PM 

i 446 

Sony swap ' 

LI 

74 

73 

1 675.361 

1 

6/12/91 

3:15PM 

1 446 

Sony swap 

u 

75 

1 73 

i 675.492 

1 

6/13/91 

2:45 TO 

446 

Sony swap 

u 

75 

73 

675.582 


6/14/91 

5. JO PM 

446 

Sony swap 

u 

75 

73 

1 675.666 

{ 

6/17/91 

2:30 PM 

446 

Sony swap 

u 

75 

74 

1 675.801 


6/19/91 

5:15PM 

446 

Sony swap 

u 

75 

74 

675.87 1 


6/21/91 

2:30 TO 

446 

1 Sony swap 

u 

75 

75 

675.9 


6/25/91 

3:00 PM 

446 

Sony swap 

u 

75 

75 

675.951 


7/1/91 

2:45 PM 

446 

Sony swap 

u 

1 75 

75 

676.038 


7/12/91 

1:30 PM 

446 

Sony swap 

u 

! 75 

77 

676.121 


7/26/91 

[ 2:00 PM 

446 

1 Sony swap 

1 U 

1 74 

77 

676.189 


6/7/91 

4:00 TO 

447 

Sony swap 

u 



676.81 


6/10/91 

5:00 PM 

447 

Sony swap 

u 

1 75 

73 

677.453 


6/11/91 

2:45 PM 

447 

Son> swap 

u 

74 

73 

677.528 


6/12/91 

3:15 PM 

447 

Sony swap 

u 

75 

73 

677.584 



231 






Exhibit 1 Recorded Data (continued) 


Date 

time 

n 



Temp 

CF) 

RH 

(%) 

Weight (g) 

Scale 


2:45 PM 


Sony swap 

U 

75 

73 

677.6M 


6/14/91 

5:00 PM 

447 

Sony swap 

U 

75 

73 

677.666 


6/17/91 

2:30 PM 

447 

Sony swap 

u 

75 

74 

677.765 


6/19/91 

5:15PM 

447 

Sony swap 

u 

75 

74 

677.802 


6/21/91 

2:30 PM 

447 

Sony swap 

u 

75 

75 

677.827 


6/25/91 

3:00 PM 

447 

Sony swap 

u 

75 

75 

677.86 


7/1/91 

Z45PM 

447 

Sony swap 

u 

75 

75 

677.935 


TiYim 

1:30 PM 

447 

Sony swap 

u 

75 

77 

678.006 



2:00 PM 

447 

Sony swap 

u 

74 

77 

678.074 


6/7/91 

4:00 PM 

448 

Sony swap 

u 



675.507 


6/10/91 

5:00 PM 

448 

Sony swap 

u 

71 

18 

674.622 


6/11/91 

Z‘45PM 

448 

Sony swap 

u 

71 

18 

674.544 


6/12/91 

3:15PM 

448 

Sony swap 

u 

71 

18 

674.487 


6/13/91 

Z45PM 

448 

Sony swap 

u 

71 

19 

674.457 


6/14/91 

5:00 PM 

448 

Sony swap 

u 

71 

19 

674.444 


6/17/91 

2:30 PM 

448 

Sony swap 

u 

71 

IS 

674.362 


6/19/91 

5:15PM 

448 

Sony swap 

u 

71 

18 

674.341 


6/21/91 

Z30PM 

448 

Sony swap 

u 

71 

20 

674.388 


6/25/91 

3:00 PM 

448 

Sony swap 

L 

71 

20 

674.35 


7/1/91 

2:45 PM 

448 

Sony swap 

u 

71 

18 

674.339 


7/12/91 

1:30 ™ 

448 

Sony swap 

u 

71 

20 

674 313 


7/26/91 

ZOO PM 

448 

Sony swap 

u 

71 

18 

674.216 


6/7/91 

4:00 ™ 

449 

Sony swap 

u 



672.411 


6/10/31 

5:00 PM 

449 

Sony swap 

u 

71 

18 

670.747 


6/11/91 

Z45PM 

449 

Sony swap 

u 

71 

18 

670.613 


6/12/91 

3:15™ 

449 

Sony swap 

u 

71 

18 

670.497 


6/13/91 

Z45PM 

449 

Sony swap 

u 

71 

19 

670.43 


6/14/91 

5:00 PM 

449 

Sony swap 

u 

71 

19 

670.391 


6/17/91 

Z30PM 

449 

Sony swap j 

u 

71 

18 

670.278 


6/19/91 

5:15 PM 

449 

Sony swap 

u 

71 

18 

670.245 


6/21/91 

Z30PM 

449 

Sony swap 

u 

71 

20 

670,287 


^/25/91 

3:00 PM 1 

449 

Sony swap 

u 

71 

20 

670.248 i 


7/1/91 

Z45PM 

449 

Sony swap 

u 

71 

18 

670,223 j 


7/12/91 

1:30 PM 1 

449 

Sony swap 

u 

71 

20 

670.202 


7/26/91 

zoo™ 

449 

Sony swap 

u 

71 

18 

670.0826 


6/7/91 

4:00 ™ 

450 

Sony swap 

u 



676.446 


6/10/91 

5:00 PM 

450 

Sony swap 

u 

71 

18 

674.812 


6/11/91 

Z4S™ 

450 

Sony swap 

u 

71 

18 

674.67 


6/!2/91 

3:15™ 

450 

Sony swap 

u 

71 

18 

674.56 


6/13/91 

Z45PM 

450 

Sony swap 

u 

71 

19 

674.511 


6/14/91 

5:00 ™ 

450 

Sony swap 

u 

71 

19 

674.457 


6/17/91 

Z30™ 

450 

Sony swap 

u 

71 

18 

674.332 


6/19/91 

5:15 PM 

45 

Sony swap 

u 

71 

18 

674.297 


6/21/91 

2:30 PM 

450 

Sony swap 

u 

71 

20 

674.336 


6/25/91 

3:00 PM 

450 

Sony swap 

u 

71 

20 

674.299 


7/1/91 

2:45™ 

450 

Sony swap 

u 

71 1 

18 

674.267 


7/12/91 

1:30 PM 

450 

Sony swap 

u 

71 

20 

674.237 


7/26/91 

2:00 PM 

450 

Sony swap 

u 

71 

18 

674.133 


6/28/91 j 

Z30PM 

576 

Sony 

V 

71 

18 

673.645 



232 








Sxhlblt 1 Recorded Data (continued) 


Date 

Time 

Tape # 

mil 

Lot 

T«mp 

cn 

RH 

(%) 

Weight (g) 

Scale 

winsiwm 

2:45 PM 

576 

Sony 

V 

71 

18 

672.732 


7/2/91 

3:30 PM 

576 

Sony 

V 

71 

18 

672.619 


7/3/91 

1:30 PM 

576 

Sony 

V 

71 

19 

672.57 


7/8/91 

5:301^ 

576 

Sony 

V 

71 

18 

672.416 


7/12/91 

1:30 PM 

576 

Sony 

V 

71 

20 

672.404 


iirn^x 

4:45 PM 

576 

Sony 

V 

71 

19 

G72.329 


7/19/91 

3:15 PM 

576 

Sony 

V 

71 

18 

672.313 


7/23/91 

4:30 PM 

576 

Sony 

V 

71 

18 

672.325 


7/26/91 

1:45 PM 

576 

Sony 

V 

71 

18 

672.23 


7/31/91 

1:45 PM 

576 

Sony 

V 

71 

18 

672.279 


8/7/91 

5:00 PM 

576 

Sony 

V 

71 

17 

672.249 


8/16/91 

3:45 PM 

576 

Sony 

V 

71 

17 

672.269 


8/29/91 

4:30 PM 

576 

Sony 

V 

65 

16 

672.274 


6/28/91 

2:30 m 

577 

Sony 

V 

71 

18 

672.411 


7/1/91 

2:45 PM 

577 

^ony 

V 

71 

18 

671.591 


7/2/91 

3:30 ™ 

577 

c>ony 

V 

71 

18 

671.464 


7/3/91 

1:30 PM 

577 

Sony 

V 

71 

19 

671-43 


7/8/91 

5:30 PM 

577 

Sony 

V 

71 

18 

671.239 


7/12/91 

1:30 PM 

577 

Sony 

V 

71 

20 

671.223 


7/17/91 

4:45 PM 

577 

Sony 

V 

71 

19 

671.142 


7/19/91 

3:15 PM 

577 

Sony 

V 

71 

18 

671.141 


7/23/91 

4:30 PM 

577 

Sony 

V 

71 

18 

671.126 


7/26/91 

1:45 PM 

577 

Sony 

V 

71 

18 

67V088 


7/31/91 

1:45 PM 

577 

Sony 

V 

71 

18 

671.089 


8/7/91 

5:00 PM 

577 

Sony 

V 

71 

17 

671.058 


8/16/91 

3:45 PM 

577 

Sony 

V 

71 

17 

671.07 


8/29/91 

4:30 PM 

577 

Sony 

V 

' 65 

16 i 

671.076 


6/28/91 

2:30 PM 

578 

Sony 

V 

/I 

18 

673.42 


7/1/91 

2:45 PM 

578 

Sony 

V 

71 

18 

672.586 


7/2/91 

3:30 PM 

578 

Sony 

V 

71 

18 

672.455 


7/3/91 

1:30 PM 

578 

Sony 

V 

71 

19 

672.414 


7/8/91 

5:30 PM 

578 

Sony 

V 

71 

18 i 

672.244 


7/12/91 

1:30 PM 

578 

Sony 

V 

71 

20 

672.246 


7/17/91 

4:45 PM 

578 

Sony 

V 

71 

19 

672.156 


7/19/91 

3:15 PM 

578 

Sony 

V 

71 

18 

672.135 


7/23/91 

4:30 PM 

578 

Sony 

V 

71 

18 i 

672.145 


7/26/91 

1:45 PM 

578 


V 

71 

18 

672.103 


7/31/91 

1:45 PM 

578 

Sony 

V 

71 

18 

672.107 


8/7/91 

5:00 PM 

578 

Sony 

V 

71 

17 

6/2.065 


8/16/91 

3:45 PM 

578 

Sony 

V 

71 

17 

672.076 


8/29/91 

4:30 PM 

578 

Sony 

V 

65 

IS 

G72.087 


6/28/91 

2:30 PM 

579 

Sony 

V 

75 

74 

670.963 


7/1/91 

2:45 PM 

579 

Sony 

V 

75 

75 

671.64 


7/2/91 

3:30 PM 

579 

Sony 

V 

75 

75 

671.718 


7/3/91 

1:30 PM 

579 

Sony 

V 

75 

74 

671.772 


7/8/91 

5:30 PM 

579 

Sony 

V 

75 

74 

671.921 


7/12/91 

1:30 PM 

579 

Sony 

V 

75 

77 

671.C77 


7/1 7/91 

4:45 PM 

579 

Sony 

V 

75 

76 

672.013 

1 

7/19/91 

3:15 PM 

579 

Sony 

V 

75 

75 

672.023 

1 


233 











Eachlblt 1 Recorded Data (continued) 


Date 

Time 

Tape i 


L4>t 

Temp 

RH 

Weight (g) 

cr) 

(%) 



4:30 PM 

579 

Sony 

V 

74 

76 

672.034 

7/26/91 

1:45 PM 

579 

Sony 

V 

74 

77 

672.097 

7/31/91 

1:45 PM 

579 

Sony 

V 

74 

77 

672.149 

8/7/91 

&00PM 

579 

Sony 

V 

74 

78 

672.22 

8/16/91 

3:45 PM 

579 

Sony 

V 

75 

77 

672.27 

8/29/91 

4:30 PM 

579 

Sony 

V 

71 

62 

672.187 

6/28/91 

2:30 PM 

58C 

Sony 

V 

75 

74 

666.127 

7/1/91 

2:45 PM 

580 

Sony 

V 

75 

75 

666.815 

7/2/91 

3:30 PM 

580 

Sony 

V 

75 

75 

666.888 

7/3/91 

1:30 PM 

580 

Sony 

V 

75 

74 

666.948 

7/8/91 

5:30 PM 

580 

Sony 

V 

75 

74 

667.09 

7l\2/9\ 

1:30 PM 

580 

Sony 

V 

75 

77 

667.147 

7/17/91 

4:45 m 

580 

Sony 

V 

75 

76 

66V. 185 

7/19/91 

3:15 m 

580 

Sony 

V 

75 

75 

667.201 

7/23/91 

4:30 PK: 

580 

Sony 

V 

74 

76 

667.207 

7/26/91 

1:45 m 

580 

Sony 

V 

74 

77 

667.263 

7/31/91 

1:45 m 

580 

Sony 

V 

74 

77 

667.361 

8/7/91 

5:00 m 

580 

Sony 

V 

74 

78 

667.389 

8/16/91 

3:45 m 

580 

Sony 

V 

75 

77 

667.43 

8/29/91 

4:30 m 

580 

Sony 

V 

71 

62 

667.354 

6/28/91 

2:30 m 

581 

Sony 

V 

75 

74 

673.536 

7/1/91 

2:45 m 

581 

Sony 

V 

75 

75 

674.186 

7/2/91 

3:30 m 

581 

Sony 

V 

75 

75 

674.259 

7/3/91 

1:30 m 

581 

Sony 

V 

75 

74 

674.315 

7/8/91 

5:30 m 

581 

Sony 

V 

75 

74 

674.466 

7/12/91 

1:30 m 

581 

Sony 

V 

75 

77 

674.524 

7/17/91 

4:45 m 

581 

Sony 

V 

75 

76 

674.544 

7/19/91 

3:15 PM 

681 

Sony 

V 

75 1 

75 

674.573 

7/23/91 

4:30 m 

581 

Sony 

V 

74 

76 

674.588 

7/26/91 

l*45m 

581 

Sony 

V 

74 i 

77 1 

674.67? 

7/31/91 

1:45 m 

581 

Sony i 

V 

74 

77 1 

674.709 

8/7/91 

5:00 m 

581 

Sony 

V 

74 

78 

674.774 

8/16/91 

3:45 m 

581 

Sony 

V 

75 

17 

674.812 

8/29/91 

4:30 m 

581 

Sony 

V 

71 1 

62 

674.737 

7/24/91 

1:15m 

582 

Ampex 

X 

71 

17 

682.356 

7/25/91 

4:00 m 

582 

Ampex 

X 

71 j 

17 

681.761 

7/26/91 

1:45 m 

58? 

Ampex 

X 

71 I 

13 

681.586 

7/31/91 

1:45 m 

1 582 

Ampex 

X 

71 

18 

681.023 

8/1/91 

4:00 m 

582 

A npex 

X 

71 j 

18 

680.963 

8/2/91 

3:30 m 

582 

Ampex 

X 

71 

18 

680.956 

8/6/91 i 

5:15 PM 

582 

Ampex 

X 

71 j 

18 

680.822 

8/8/91 

3:00 PM 

582 

Ampex 

X 

71 

17 

680.823 

8/14/91 j 

4:30 m 

582 

Ampex 

X 

71 

16 

680.763 

8/16/91 

3:45 m 

582 

Ampex 

X 

71 

17 

6d0.^"\ 

8/23/91 

3:45 m 

582 

Ampex 

X 

72 

18 

680.713 

8/29/91 

4:30 PM 

582 i 

Ampex 

X 

65 

6 

680.750 

7/24/91 

1:15m 

583 

Ampex 

X 

71 

17 

683.553 

7/25/91 

4:00 m 

583 i 

Ampex 1 

X 1 

71 1 

17 i 

682.946 

7/26/91 

1:45 m 

583 

Ampex 

X 

71 

18 

682.652 


Scale 


234 











Exhibit 1 Recorded Data (continued) 


Date 

Time 

Tape # 

Me 

Lot 

Temp 

(T) 

RH 

WSM 

Scale 

7/31/91 1 

?:45PM1 

583 

Ampex 

X 

71 

18 

682.178 


8/1/91 

4:00 PM 

563 

Ampex 

X 

71 

18 

682.095 


8/2/91 

C.30PM 

583 

Ampex 

X 

7. 

18 

682.06 


8/6/91 

5:15 PM 

583 

Ampex 

X 

71 

18 

681.925 


8/8/91 

3:00 PM 

583 

Ampex 

X 

71 

17 

681.93 


8/14/91 

4:30 PM 

583 

Ampex 

X 

71 

16 

681.891 


8/16/91 

3:45 PM 

583 

Ampex 

X 

71 

17 

681.887 


8/23/91 

3:46 PM 

583 

Ampex 

X 

72 

18 

681.817 


8/29/91 

4:30 PM 

583 

Ampex 

* 

65 

16 

681.856 


7/24/91 

l:15nyf 

584 

Ampex 

X 

71 

17 

681.779 


7/25/91 

4:00 

584 

Ampex 

X 

71 

r 

681.244 


26/91 

1:45 PM 

584 

Ampex 

X 

71 

18 

680.965 


. /31/91 

i:45PM 

584 

Ampex 

X 

71 

18 

680.422 


8/1/91 

4:00 PM 

584 

Ampex 

X 

71 

18 

680.425 


8/2/91 

3:30 PM 

584 

^mpcx 

X 

71 

18 

680.354 


8/6/91 

5:15PM 

584 

Ampex 

X 

71 

18 

680.199 


8/8/91 

3:00 PM 

584 

Ampex 

X 

71 

17 

680.254 


8/14/91 

4:30 ™ 

584 

Ampex 

X 

71 

16 

f380.164 


8/16/91 

3:45 PM 

584 

Ampex 

X 

71 

17 

680.161 


8/23/91 

3:45 m 

584 

Ampex 

X 

72 

18 

680.111 


8/29/91 

4:30 PM 

584 

Ampex 

X 

65 

16 

680.149 


7/24/91 

1:15m 

585 

Ampex 

X 

74 

77 

681.283 


7/25/91 

4:00 PM 

585 

.^mpcx 

X 

75 

77 

681.732 


7/26/91 

1:45 PM 

585 

fmpex 

X 

74 

77 

631.892 


7/31/91 

1:45 PM 

585 

Ar.'pcx 

X 

74 

77 

682.407 


8/1/91 

4:00 PM 

585 

Ampex 

X 

75 

77 

682.447 


8/2/91 

3:30 m 

585 

Ampex 

X 

74 

77 

682.436 


8/6/91 

5:15 PM 

585 

Ampex 

X 

74 

77 

582.464 


8/8/91 

3:00 FM 

585 

Ampex 

X 

75 

78 

682.562 


8/14/91 

4:30 PM 

585 

Ampex 

X 

*75 

77 

682.609 


8/16/91 

3:46 PM 

58^ 

Ampex 

X 

75 

77 

682.648 


8/23/91 

3:45 m 

58 

Ampex 

X 

74 

77 

682.704 


8/29/91 

4:30 m 

585 

Ampex 

X 

71 

62 

682.6a3 


7/24/91 

1:15m! 

i 586 

Ampex 

X 

74 

77 

683.727 


7/25/91 

4:00 m 

586 

Ampex 

X 

75 

77 

684.206 


7/26/91 

1:45 m 

586 

Airoex 

X 

74 

77 

1 684.40 


7/31/91 

1:45 m 

580 

Ampex 

X 

74 

77 

684.918 


8/1/91 

4:00 m 

586 

Ampex 

X 

75 

77 

684.965 


8/2/91 

3:30 m 

586 

Ampex 


74 

i ^ 

664.972 


8/6/91 

5:15 m 

586 

Ampex 

X 

V4 

77 

685.C»25 


8/8/91 

3:00 m 

586 

i Ampex 

X 

75 

78 

685.116 


8/14/91 

4:30 m 

586 

Ampex 1 

1 X 

1 

77 

686.179 


8/16/91 

3:45 m 

j 586 

Ampex 

X 

75 

1 77 

685.207 


8/23/91 

3:45 m 

586 

Ampex 

X 

: 74 i 

1 77 

685.251 


8/29/91 

1 4:30 m 

586 

Ampex 

X 

1 71 

62 

685.167 


7/24/91 

1:15 m 

537 

Ampex 

X 

74 

77 

682.72 


7/25/91 

1 4:00 m 

567 

1 Ampex 

X 

75 

77 

683.164 


7/26/91 

1 1:45 m 

587 

Ampex 

X 

74 

77 

1 683.384 


7/31/91 

, 1:45 m 

587 

Ampex 

X 

74 

77 

i 683.861 



235 



£xdibit 1 Record^c Data (continued) 


1 Dat« 

Tlxn« 

T.p« # I 


Lot 

T«mp 

CF) 

8H 

{%) 

Weight (0 

Scale 

!T/T791 

Too™ 

587 ^ 

Ampcx 

X 

75 

77 

683.929 


1 S/2/'Si 

3:30 PM 

587 

Ampcx 

X 

74 

77 

683.922 


8/6/91 

5 15 PM 

587 

Anipex 

X 

74 

77 

683.987 


8/8/9t 

3:00 PM 

587 

''’Tipcx 

X 

75 

*/8 

684.069 

1 

8/14/91 

4:30 PM 

567 

Ampcx 

X 

75 

77 

684.137 


6/16/ftl 

3:45 PM 

567 

Ampex 

X 

75 

77 

684.159 


8/23/91 

3:45 PM 

587 

Ampcx 

X 

74 

77 

684.189 


8/29/91 

4:30 PM 

587 

Ampcx 

X 

71 

62 

684.12 


e/1,^1 

3:45 PM 

554 

Ampcx 

y 

71 

18 

681.33 


j 8/2/91 

3:30 PM 

554 

Ampcx 

Y 

71 

18 

68C.715 


8/?^/9i 


554 

Ampcx 

Y 

71 

17 

680.^03 


j 3,0/91 


554 

Ampcx 

Y 

71 

17 

679.938 


8/-/91 


554 

Ampcx 

Y 

71 

17 

679.886 


ft/S/'Bl 

3:00 PM 

554 

Ampcx 

Y 

71 

17 

679.949 


8/9/91 

4.15 PM 

554 

Ampcx 

Y 

71 

16 

679.83 


8/14/91 

4:30 TO 

554 

Ampcx 

Y 

71 

16 

679.735 


[ 8/16/91 

3:45 PM 

554 

Ampex 

Y 

71 

17 

679.762 


i S/i/91 

3:45 PM 

555 


Y 

71 

16 

684.485 


3/2/91 

3 JPM 

555 

Ampcx 

Y 

71 

18 

683.846 


8/5/91 

6;i5PM 

555 

Ampcx 

Y 

71 

17 

683.244 


8/6/91 

5;i5PM 

o55 

Ampcx 

Y 

71 

17 

683.14 

[ 

6/7/91 

5:00 PM 

555 

Ampcx 

Y 

71 

17 

683.051 


8/8/91 

3:00 PM 

555 

Ampcx 

Y 

71 

17 

683.156 


8/9/91 

4:15 PM 

555 

Ampex 

Y 

71 

16 

683.006 

1 

1 8/14/91 1 

4:30 PM 1 

S55 

Anipcx 

Y 

71 

16 

682.876 


! 8/16/91 1 

3:45 PM 

555 

Ampex 

Y 

71 

17 

682.885 1 


j 8/1/91 j 

3:45 PM 

556 j 

Ampcx 

Y 

71 1 

18 

684.906 * 


I 8/2/91 ! 

3:30 TO i 

556 

Ampcx 

Y 

71 

18 

684.193 

8/5/91 1 

6:15PM| 

556 1 

Ampcx 

Y 

71 

17 

683.449 j 

1 8/6/91 

5:15PM 

556 ! 

Ampcx 

Y 

71 i 

17 

683.36 

8/7/9! 1 

5:00 PM 

556 

Amp'^x 

Y 

71 

17 

683.312 i 

8/8/91 

3:00 PM 

556 

Ampcx 

Y 

71 

17 

683.299 


8/9>ri 

4:15 PM 

556 

Ampcx 

Y 

71 

16 

683.28 


8/14/Ql 

1 4:30 TO 

556 

Ampcx 

Y 

71 

io 

683.158 


8/16/91 1 

1 3:45 TO 

556 

Ampcx 

Y 

71 

17 

683.145 


8/1/91 

1 3 15 TO 

595 

Ampcx 


75 

77 

684.943 

i 

8/2/91 J 

1 3:30 PM 1 

j 595 

Ampex 

Y 

? 

77 

685.366 

1 

8/5/91 

a-iSPM! 

1 505 

Ampcx 

Y 

74 

76 

685.841 


8/6/,. 1 

5:.15PM| 

595 

Ampcx 

Y 

74 

77 

685,919 


S/7/91 

5:00 PM i 

1 505 

Ampcx 

Y 

74 

78 

685.998 


8/8/91 

3:00 PM 

505 

Ampcx 

Y 

75 

78 

686,06 


8/9/91 1 

1 4:15 PM 

595 

Ampex 

Y 

75 

77 

686.135 


C/14/91 ! 

I 4:30 PM 1 

1 505 

Ampcx 

1 Y 

75 

77 

686.273 


8/16/91 

3:45 PM 

595 

Ampcx 

i Y 

75 

77 

686.302 


6/29/91 

4:30 TO 

505 

Ampcx 

! Y 

71 

62 

686.284 


8/1/91 

3:45 PM 

614 

Ampcx 

1 ^ 

75 

Ti 

682-098 


8/2/91 

3:30 PM 

614 

Ampcx 

1 Y 

7 

77 

682-55 


8/5/91 

6:15 PM j 

614 1 

1 Ampe^ 

1 Y 

74 

76 

682-987 


1 e/6/91 

5:15 PM 

' 614 

' Ampcx 

Y 

74 1 

|77 

683.055 



236 



Exhibit 1 Recordsd Data (continued) 


D«U 

Ttin* 

Tftp« # 


Lot 

T«mp 

CF) 

RH 

(%) 


Scale 


5:00 PM 

614 

Azcpex 

Y 

74 

78 

683J38 


Q/8/91 

3:00 PM 

614 

Ampex 

Y 

75 

78 

683,188 


8/9/91 

4:15 PM 

614 

Ampex 

Y 

75 

77 

683.258 


8/14/91 

4:30 PM 

614 

Aapex 

Y 

75 

77 

683-392 


8/16/91 

3:45 PM 

614 

Ampex 

Y 

75 

77 

683.411 


8/29/91 

4:30 PM 

614 

Ampex 

Y 

71 

62 

683.37R 


8/1/91 

3:45 PM 

621 

Ampex 

Y 

75 

77 

682.395 


8/2/9: 

3:30 PM 

621 

Ampex 

Y 

? 

77 

682.72 


8/5/91 

6:15PM 

621 

Ampex 

Y 

74 

76 

683.215 


8/6/91 

5:15PM 

621 

Ampex 

Y 

74 

77 

683.271 


8/7/91 

ScOOPM 

621 

Ampex 

Y 

74 

78 

683,37 


0/8/91 

3:00 PM 

621 

Ampex 

Y 

75 

78 

683.418 


8/9,^l 

4:15 PM 

621 

Ampex 

Y 

75 

77 

683.493 


8/14/91 

4:30 PM 

621 

Ampex 

Y 

75 

77 

683.636 


8/16/91 

3:45 PM 

621 

A*upcx 

Y 

75 

77 

683.643 


8/29/91 

4:30 PM 

621 

Amp^x 

Y 

71 

62 

683.579 


9/6/91 

9:30 AM 

621 

Ampex 

Y 

74 

78^ 

683.86 

PM6100 

9/6/01 

2:45 PM 

557 

Ampex 

2 

71 

17 

680.82 


9/7/91 

11:30 PM 

557 

Ampex 

2 

72 

18 

680.24 


9/9/91 

3:45 PM 

557 

Ampex 

2 

72 

18 

67987 


9/1C/91 

2:00 PM 

557 

Ampex 

2 

72 

16 

679.7 


9/11/91 

3:00 PM 

557 

Ampex 

z 

72 

IS 

679.63 1 

9/12/91 

2:00 PM 

557 

Ampex 

2 

71 

17 

679.6 


9/13/91 

4:00 PM 

557 

Ampex 

2 

71 

17 

679.58 


9/6/91 

2:45 PM 

558 

Ampex 

Z 

71 

17 

683.68 


9/7/91 

11:30 PM 

558 

Ampex 

z 

72 

18 

683.065 


9/9/91 

3:45 PM 

t 558 

Ampex 

1 z 

72 

18 

682.69 


9/10/91 

2:00 PM 

1 5f« 

Ampex 

: 2 

72 

16 

682.5 


9/11/91 * 

3:00 PM 

1 558 

A-ripcx 

! z 

72 

1 J8 

682.4 


9/12/91 

2:00 PM 

1 558 

Ampex 

z 

1 71 

1 A7 

682.41 


9/13A91 

4:00 PM 

558 

Ampex 

2 

71 

: 17 

682.36 ' 1 

9/6/91 

2:45 PM 

1 559 

Ampex 

Z 

! 71 

1 17 

682.32 


9/7/91 

11:30 PM 

559 

Ampex 

Z 

i 72 

i 16 

681,7 


9/9/'91 

3:45 PM 

559 

Amp)cx 

z 

72 

18 

681.38 


9/10/91 

2 00 PM 

559 

1 Ampex 

z 

72 

1 16 ! 

681,21 


9/11/91 

i 3:00 PM 

559 

Ampex 

z 

72 

18 

661.15 


9/12/91 

2:00 PM 

559 

Ampex 

z 

71 

17 

681.15 


9/13/91 

4:001^ 

559 

Ampex 

z 

71 

17 

681.09 


9/6/91 

2:45 PM 

560 

.\mpex 

z 

74 

79 

66:1.63 


9/7/91 

11:30 PM 

560 

Ampex 

*7 

74.5 

78 

383.29 

1 

9/9/91 

3:45 PM 

560 

Ampex 

z 

74 

78 

683.52 

9/10/91 

2:00 PM 

560 

Ampex I 

z 

75 

79 

683.58 


9/11/91 

3:00 PM 

560 

Ampex i 

z 

74 

80 

683.69 


9/12/91 

2:00 PM 

560 

Ampex 

z 

74 

80 

683.79 


9/13/91 

4:00 ™ 

560 

Ampex 1 

z 

75 

79 

683.84 


9/6/91 

2:45 PM 

622 

Ampex 

z 

74 

79 

680.83 


9/7/91 

11:30 PM 

622 

Anipcx 

z 

74-5 

76 

681-48 


9/9/91 

3:45 PT4 

622 

*'jnpex 1 

z 

74 

78 

681.71 


9/10/91 

2:00 PM 

622 1 

Ampex 1 

z 

75 

79 

681.77 



237 










EzbiUt 1 Recorded Data (continued) 


Date 

TilM 

n 


Let 

crai 

■m 

RH 

(%) 

1^^ 

Settle 

0/1 1/91 

aooPM 

622 

Ampex 

T~ 

74 

00 

681.88 


9/12/91 

2:00 PM 

622 

Ampcx 

z 

74 

80 

681.96 


9/13/91 

4:00 PM 

622 

Ampex 

z 

75 

79 

682.03 


9/6/91 

2:45 PM 

674 

Ampex 

z 

74 

79 

6S7.26 


9/7/91 

11:30 PM 

674 

Ampcx 

z 

74.5 

78 

687.9 


9/9/91 

a45PM 

674 

Ampcx 

z 

74 

78 

688.14 


9/10/91 

2:00 PM 

674 

Ampcx 

z 

75 

79 

688.2 


9/11/91 

aooPM 

674 

Ampcx 

z 

74 

80 

68a31 


9/12/91 

2:00 PM 

674 

Ampex 

z 

74 

80 

68a41 


9/13/91 

4:00 PtA 

674 

Ampcx 

z 

75 

79 

68a47 



238 


















N93-80467 


Grand Challenges In Mast Storage • A Syatema Integratora Perspective 


BkhaidK.Ue 

Dtta Stonfe TMhaolQglca, bic. 

879 rianaltai Tnrapike, P. O. Bok 1283 
BldfevoodlU 074SO 

Daniel & IflBts, WJ Culver O o neuhhn , Inc. 
8800 Leerinng Pike, Sutte 403. VleBiia.VA 22183 





WUhbx today's much baUyhooed supercompuUng enutronment. with Us GFLOPS of CPU 
power, and GigabU netxvorks, there exists a mqfor roadblock to computing success; that of Mass 
Storage. 

We consider the solution to this mass storage problem to be one <f the "Grand Challenges’ 
facing the canpjlcr industry today, as well as long into the future. It has became obvious to us. 
as well as many cAJyers in the industry, that there Is no dear single solution in sight 

The Systems ,'iegrator today is faced wUh a myriad of quandaries in approaching this 
challet^e. He must fbst be tnruMKLtive in approach second choose hardware solutions that are 
volumetric efficient: high tn signal bandwidth: available from multiple sources: competitively 
priced: and have forward growth extendibility. In addition he must also comply with a variety 
of mandated, and often <xr\fUctlng software standards (COSIP. POSDC, IEEE. MSRM 4.0 and 
others), and finally he must deltver a systems solution with the "most bang for the buck" tn 
terms of cost vs. performance factors. These quardaries challenge the Systems integrator to 
"push the erwetope" in torms of his or her ingeriuity and innovation cn an almost daily basis. 

Within our presentation we will explore this dynamic further, and atierr.pt to acquaint the 
audience with rational of^)roaches to this "GrarA ChctUenge". 


Introdaetton; 

WJ Culver Consulting and Data Storage Technologies are collaborating together on this 
presentation based on our individual elTorts in supporting EOSDIS ECS Phase C/D Proposal 
teams, and in our Joint preparation of a winning solution for the NASA LaPC EOSDIS "Version 
0" DAAC* . whose contract was recently awarded to the WJ Culver Consulting team and will be 
installed on site at LaRC during July and August of this year (1992). 

Our two organizations have been intimately involved in many facets of mass storage 
system design and integration, and we feel that we have special Insights into the problems 
facing this segment of the computer industry. Wc will explore this subject from the perspective 
of having to design and field syWems today, with vision towards what the future holds. 


Definitions: 

Mass Storage has become a widely and many times Improperly used term today. It can be 
found as a reference to a simple disk or tape drive or in referring to PedaByte level systems. For 
sake of consistency we will define Mass Storage as any type of storage system exceeding . 1TB in 
total size (on-line), under control of a centralized File Management scheme. (Authors Note; We 
hope that this definition does not corrfusc the reader any further than he mlgh* already be 
confused on this subject!) 


^ The tlgnlBcsnce of the ~Version O' prototype has been hel^tened In recent months baaed on reports to 
Corntress by the dAO. and to NASA by the NRC. emphasizing the importance of this prototyping cflbrt prior to the 
fleldiiv< of the EX>SD1S EX:S systems. 


239 



On-line refers to a storage device; DMA. Network or Peripherally connected, which 
responds lo Hie requests in 0 to 15 seconds (apprcximatelyl. 

Off-line refers to a storage device. Network or Peripherally connected, which responds to 
Ole requests in several minutes to several days or weeks (approximately). 

An Automated Llbcaiy is a ph,'sical volume repository (PVR) which houses uitlUe data 
contained in volumes of robotically handled cartridges or cylinders. These systems are 
Network or Peripherally cormected and respond to file requests In 45 seconds to tens of 
minutes depending upon the size and physical architecture of the repository. 


Today's Sopercomputing Environment: 

With very few exceptions, today's supercomputing center has become a hodge podge of many 
dliferent types cl CPU's (vector, scalar. parallel/massJvely parallel. Visualization. RISC, CISC, 
etc., etc.). E^h of these units Is In competition with the others for dominance of system 
resources, and all are interconnected by elaborate networking schemes (HyperCharmel. FDDI. 
HIPPI. kluge. and others). For many years now the high performance computing industry has 
been focused only on how to achieve the highest level of CPU performance (many times by 
networking heterogeneous CPU’s together) without paying any attention to the "crisis in 
storage management" these systems have created. This blind pursuit of computing horsepower 
has created an acute crisis In today's data center: that of how to manage the huge volumes of 
input and output data required/ produced by these machines. 

These advanced processors produce volumes of bltfile data well beyond most systems 
managers wildest hallucinations. Local and network disk and tape systems are overwhelmed 
bv the growing demand for bitfUe data and their ability to store and archive vast numbers of 
exponentially increasing bitfUes is critically inadequate. Current disk farms can only store 
files for a period of 12-36 hours before being overwritten to make way for new bltfiles^. This 
has created an often untenable situation for both the systems manager and the end-user. 

To better manage this critical task, dedicated file managers and Intricate software schemes 
have been developed by many. 1 hese systems attempt to keep ahead of user needs by staging 
and re-staging bltfile data sets to the most appropriate media for the level of activity 
encountered. This is usually done over relatively iow-bandwidth channels on low efficiency 
and high cost medias (magnetic disk and square tape). Traditional off-line round/square tape 
drives have been augmented by tape libraries which behave like "slow-moving" freight trains 
of bltfiles; "The information gets there eventually, but it's a bumpy ride along the wa*-”. 

These band-aid approaches have gone a long way to help alleviate the problem for the short 
term, but are woefully inadequate for the long haul. The need for wide bandwidth, volumetric 
efficient storage systems is paramount to solve these problems. 

Another factor exacerbating this crisis further is the impact of scientific visualization on 
the supercomputer center. This new science in computing has brought about a great many 
breakthroughs in terms of solutions to problems that were previously dealt with r ^ great 
streams of numbers on nrlnt-out paper, but not without a cost. Visualization files are on the 
order of IGB^’'* in size each and when animated together produce a major drain on bltfile 
storage resources. In many centers it is less expensive to re-i-un the simulation on the 
supercomputer, than to store the visualization data. This is further compounded by the types of 


^ Results from a privately sponsored survey of 18 leading supercomputJng centers In the US during 1990 by 
Data Stora^ Technologies and CIRRUS Aerospace 

^ Physics Today, October 1987, "A NumciicaJ Laboratory", hLirl-Heinz Winkler et al. 

^ AIAA/NASA Second Internationa] Symposium on Space Information Systems, September 1990, "High 
Rate Science Data Handling on Space Station Freedom". T. tiandlc^* ct al. 


240 



hardware required to store many of these images l.e. wlde-bandwldth RAID systems optimized 
for image transfer 

The icing on the cake In terms of this entire situation is the new fiscal realities that 
everyone In the government and private sector are now facing. The days of well funded 
initiatives and large departmental budgets are gone and will never return. Today, all decisions 
are made in terms of cost as the first priority (the COTS mind act), with all other requirements 
a distant second at best. Some additional new requirements that now must be met include 
adherence to federally mandated software standards, such as POSIX, GOSIP. and the ever often 
cited lEElE Mass Storage Reference Model V4.0. 

All of these factors add up to an extremely difticult set of orders that the Systems Integrator 
must march to. Within the following section we will come to grips with many of these 
problems- 


SoMng the goandaries facing the Systems Integratoi today: 

It is our concerted opinion that ’’Innovation in Approach” is the key lo meeting the 
challenge at hand. Adherence to tried and true solutions of the not too distant past Just aren't 
acceptable any longer. E^ch systems requirement must be met as an entirely new challenge 
with no preconceived mind sets dragged along as excess baggage. This philosophy however 
must be tempered against the tendency to become romantically attached to the newest latest 
greatest tech’^vology and mistakenly use it to try and solve a problem for which it was never 
intended (as was the case when optical disk was f^lrst introduced). 

The traditional tools of data storage have been solid state memory (CPU based), rotating 
magnetic disk (CPU and network attached), and magnetic tape (on and off-line). For the most 
part these tools have suffered from a very conservative des jn approach in order to achieve 
high reliability, most times at the expense of performance and volumetric efficiency and high 
unit costs. 

The basic technologies supporting these tools have seen dramatic improvements in respect 
to performance and volumetric efficiency over the past five > . but these benefits have been 
primarily passed on to the consumer and PC /workstation kets. In order to solve the 
storage dllenunas we find today, these technologies must be applied in a broader sense to the 
high performance computing environment. Some instances of this can be seen in the advent of 
SSDD’s (DRAM based). low-cosl R.VD systems, and the use of television broadcast helical scan 
recording technology for data storage (19mm DD-1 and DD-2, and 1/2' D-3)^^. These 
technologies offer a high level of performance in terms of greater signal bandwidth and data 
capacity and are highly volumetrlcally efficient and reliable (99.00+% availability), but have 
yet to enter the mainstream in great volume. Tools of this ilk are the saviors of the future in 
our opinion. 

Other storage technologies that are entering the mainstream are enhanced optical disk and 
the first generation of optical tape drives. Optical disk storage has had a very difficult time in 
penetrating the high performance computing marketplace because of its low bandwidth, long 
latency, and high cost in respect to other technologies. This trend is slowly changing and 
optical disk is expected to have its place in the hierarchy of mass storage in the years to come. 
The interesting new optical technology is that of tape. CREO and ICI have collaborated on an 
early entry with this technology (open reel based) and new offerings are in the works from 
LaseFfape, Newell and STK (cartridge based). These systems offer very high capacities in a 
small form factor with modest data rates (3-4. b MB/s currently). 


^ For further Information see: 

a - THIC, March 1990, "Interfacing 19mm McllcaJ Scan Recording Systems to Computing Environments", 
b - 10th IEEE ^mposlum on Mass Stoiage Systems (vendor paper'). May 1990, "1 9mm Heliral Scan 
Recording Tcchnolo^ for Data Intensive Computing Environments" 
c THIC, October 1990.^I9nun Data Storage Applications" Richard Lee ct aJ. 


241 



It Is dear to us that in order to meet the varied needs of the end-user tod£^ you must choose 
frtnn all of the available storage tools a hierarchy of devices to solve the problem at hand. 
Systems of today and into the future will be a hybridization of all of these technologies. Each 
device will be used in the hierarchy where best suited (an open systems hardware approach). As 
time goes by each component can be upgraded or leplac^ with the latest-greatest device to 
continue the value of the hybrid approach, without scrapping tne entire system. 

Irmovation in approach” is not strictly limited to hardware or systems architecture. The 
choice of software tools is equally as important. There are many approaches to file 
management in use today. Some are proprietary single manufacturer approaches and others 
are collaborative amongst end-users, and manufacturers. All claim to be open architected and 
compliant with the emerging IEEE Mass Storage Reference Model (whatever that means^). 
These file management systems must also be compatible with government mandated 
standards such as GOSIP and POSIX^. This does tend to limit the field at this point in time, but 
everyone will have to be compliant at some point in the future in order to survive. 

Amongst the myriad of commercially available file management software packages 
available (Andrew. E-Mass. Mesa. Unltree. UNICOS FMS. and others), all approach the 
management of bltfiles on a hierarchical basis. Files are mounted, dis-mounted. and migrated 
through a hierarchy of storage devices based on frequericy of use and relative priority, with 
access security threaded throu^out. These systems are all adept at their task and differ (mly in 
philosophy and approach. Choosing one of these systems is a much more difficult task than 
architecting and configuring the hardware portion of the mass storage system. Attendant with 
the need to Irmovate in terms of approach is the need to reduce storage costs incrementalty. 
Storage related costs in the supercomputer center are now approaching 50+% of the entire 
capital budget in most facilities. The size of the capital budget In the future will diminish, but 
the amount of storage required will continue to increase. This mandates the use of low cost 
(relative) storage devices and attendant media. Only by aligning the requirements of the 
supercomputing data center with emerging tnj3s produced storage technologies can this 
dilemma be solved. This points towards technologies that have applications In other, more 
comnKxlity driven markets, such as consumer electronics. PC/Workstations and broadcast 
television. As mentioned earlier both RAID and helical scan magnetic tape come from these 
backgrounds and bring not only higher levels of performance and volumetric efficiency but 
substantially lowers costs as well (RAID disk = $1-3 pr/MB. HS Tape = $1.00 pr/GB). 

The architecture of most mass storage systems today is comprised of a dedicated File Server 
CPU, intercormected on one side to a network or the supercomputer (via a wide bandwidth 
peripheral chaimel) and on the other to a m 3 rriad of storage devices/systems. The management 
of activity within the dedicate lile server is handled by the file management software which 
behaves Ilk a large disk drive to the host supercomputer or network. Tills approach has 
proven to be the benchmark today, but is very expensive and wasteful. Marty facilities require a 
supercomputer similar In capabiUtles to its host to .ict as a file server in oi^er to have enough 
available high speed peripheral intercormects available (the file server acts as a "governor” to 
the entire computing facility as it controls the flow and speed of all devices connected to it. 

This approach has worked for some time now, but will not survive in its present capacity 
into the futur?. The use of FDDI and HiPPI fabrics with Intricate switching networks will soon 
obsolete this approach. The elimination of an expe.islve CPU will be a great cost and time 
savings to the supercomputer center. The use of these wide bandwidth "fabrics” will also allow 
the interface of new HIPPI/IPI peripherals directly to the host supercomputers. This will speed 
up system performance by orders of magnitude. 


^ After two years of work, the IEEE Stora^ Systems Working Group would just as soon have no one mandate 
that a storage qrstem be compliant to the Mass Storage Ucterencx Model (V4.0 or earlier) as the newest thinking is quite 
different as to when these 'models' were coriceived in the minds of the IEEE MSRM executive conunittee. 

^ The recorKHllation of TCP/IP p otocols against the OSl FTAM's has created a wide rift in both ihe end-user 
and manufacturer's communities. 


242 



We see the future as one where simplicity in approach will be the winning solution. The 
ultra intricate, cobbled together, dedicated file servers of today will be recced by wide 
bandwidth, direct connected. v*)lumetric cfllclent. peripherals In the not too distant future. 
This approach is the only one which win allow the supercomputer CPU to ever achieve Its full 
)tentlal and pay back to the end-user and his ^xinsors. 


Coochisloiis: 

Within orr brief overview of the Mass Storage marketplace and the "Grand Challenges" that 
It presents to the Systems Integrator we have attempted to show that a new order must emerge 
in oixler to meet the end-users requirements and yet be affordable In terms of procurement, and 
flexible In terms of future growth. Only by accepting a new paradigm In terms of architecture 
and approach will the supercomputing industry ever be able to harness the ever growing "Crisis 
in Mass Storage". 


243 




N93-80468 


THE MODERN HIGH RATE DIGITAL CASSETTE RECORDER 

Martin Clemow 
Penny & Giles Data Systems Ltd 
Tbe MilL Wookey Hole. 

Wells, Somerset BAS IBB 
England 


INTRODUCTION 

The magnetic tape recorder has played an essential role In the capture and storage of 
Instrumentation data for more than thirty years. During this time, data recording technology 
has steadily progressed to meet user demands for more channels, wider bandwldths and longer 
recording durations. When acquisition and processing moved from analogue to digital 
techniques, so recorder design followed suit. Milestones marking the evolution of the data 
recorder through these various stages - multi-track analogue, high density longitudinal digital 
and more recently rotary digital - have often represented important breakthroughs in the 
handling of ever-grcater quantities of data. 

Throughout this period there has been a very clear line of demarcation between data 
storage methods in the "instrumentation world" on the one hand and the "computer peripheral 
world" on tho other. This is despite the fact that instrumentation data, whether analogue or 
digital at the point of acquisition, is now likely to be processed on a digital computer at some 
stage. Regardless of whether the processing device is a small personal computer, a work- 
station or the largest supercomputer, system Integrators have traditionally been faced with the 
same basic problem - how to interface what is essentially a manually controlled, continuously 
running device (the tape recorder) Into the fast start/stop computer envlrorunent without 
resorting to an excessive amount of complex custom interfacing and performance compromise. 

The increasing availability of affordable high power processing equipment throughout the 
scientific world Is forcing recorder manufacturers to make their latest and perhaps most 
important breakthrough - the computer-friendly data recorder. 

This paper discusses the operating characteristics of such recorders and considers the 
resultant impact on both data acquisition and data analysis elements of system configuration. 




-jy' 




BRIDGING THE GAP 

Traditional multi-track recorders (both analogue and high density digital) take the 
timebase of the Information to be recorded for gran;.ed. The tape runs continuously at an 
appropriate speed and data is applied to the input for the duration of the experiment or process. 
Just like the trace on a paper chart recorder, the record is in a simple Y-T form, with ’V* being 
represented by the magnetic flux pattern on tape while the ’T‘ Information is contained in the 
tape motion itself. If a recorded tape is re-wound and replayed at the same speed and in the 
same direction, the output is expected to be a close representation of the original Input data, 
including its timebase. Timebase compression or expansion can be achieved by increasing or 
decreasing the tape speed. Time inversion is also possible by reversing the direction of tape 
movement. The Important point is that an Indication of the passage of time is inherent in the 
operation of the classical data recorder. 

Until now. this feature has been both a strength and a weakness. A strength in terms of the 
ability to manipulate the passage and direction of time on a recorded experiment during the 
analysis process, but a weakness when It is necessary to input the data to a computer in 
anything but the simplest free-running mode. Given that most computers require data to be 
input to disk or memory in chunks at a fixed rale, it is not a simple matter to control the data 
flow from a constant speed system efficiently without recourse to time-consuming stop- 


245 





reverse-restart routines. In contrast. c(»nputer perlpheials start and stop rapidly in order to 
control the flow of data. This latter attribute would, therefore, appear to be a necessary 
characteristic for a data recorder to be ci ..sldered as computer-friendly. 

In addition to fast start/stop of the tape itself, some high rate digital cassette recorders 
Incorporate input and output data buffering to allow the tape transport to start and stop during 
data transfer as necessary. The buffer capacity will be determined 1^ the need to ensure that aU 
possible sequences of tape movement (ramp up. ramp down, etc.) can be accommodated without 
loss of data. 

The use of buffered data Input/output, while greatly simplifying the actual transfer of data. 
Introduces more wide-ranging implications than might at flmt be obvious. For a user to gain 
the maximum benefit from the closer integration of the recorder Into the computer 
envtrr'nment. It becomes necessary to consider the whole data acquisition and analysis 
process rather than Just the recorder itself. 

If we accept the fundamental prliKlple that computers need to clock data into memory in 
bursts by starting and stopping the tape, how are we going to retain the important timebase 
Information which was so conveniently available by the very movement of the tape on a 
contlnuousfy runnlr>^ system? This consideration leads naturally on to the actual control of 
data. On the ctanmand to start, trarlitional data recorders ramp gently up to speed, lock in and 
then data is available on the correct timebase. When told to stop, they ramp gracefully down 
again to rest. If "good" data has been recorded on the tape at these ramping points. It is 
effectively lost or at least corrupted due to the slewing of the tape speed. 

This Is clearly unacceptable for reliable data transfer so a subtle change of emphasis is 
needed. It is important now to think In terms of controlling the flow of data - not the 
movement of the tape Itself. 

Computer friendliness also Implies reliable and convenient data management. It is 
relatively ea^ to append housekeeping data during recording, but what type of data will be 
most ustful. and hor ''an It be used to best effect? For example, if the user intends to search his 
records by date. Umt or event. It Is critical that he develop an overall strategy for the creation, 
logging and management of this type of auxiliary information. 


DATA FORMATTING 


Intuitively. It would seem desirable to establish a common data format throughout the data 
capture and processing path if only to avoid the complexity and cost of uimecessary format 
conversions. This philosophy requires an analysts not onfy- of the way data is to be recorded, 
but of the whole network (both current and planned future expansion) to establish, for 
example, the best word width to use (for example: 8. 16 or 32 bits). Some recorders support only 
8-bit formats while others can be user conflgured for all three formats. If a coirjnon Interface 
format can be used throughout, the total system can be greatly simplified. 


If the source data is serial In nature, it is important to decide carefully wh^^ ' •'''nvert 
from serial to parallel. In general, high rate serial recording charmels ar '^ud 

expensive, so It is often best lo perform the conversion before recordin - of 

standardizing on a common data interface format will generally redui all system 

complexity and cost, with the added benefit of Increased flexibility and equips.. utilization. 


246 



BUFFBBED DATA TRANSFER 

It Is most unlikely that the clock rate of the acquisition process (e.g. analogue-to-dlgltal 
conversion) will be identical to that of the analyzing computer. This means that a change of 
timebase Is almost certam to be required somewhere within the data path. Looking at the 
complete s)rstem. several important points should be considered. In any recording system, if 
the tape Is to be used efllclently data should be recorded on tape at the maximum ra'^^d density. 

In the case of a continuously running system (longitudinal or rotary), this has 
traditionally meant adjusting the tape speed (and scanner speed, if appropriate) to match the 
input or output data rate. However, when the recorder Incorporates a read/write buffer. It Is 
usually arranged so that data Is written to or read from tape at a single, fixed rate and tape 
speed. 

Input/Output rates below the recorder's specified maximum will result in its buffer filling 
or emptying at a slower rate. The recorder accommodates this by automatically stopping the 
tape until such a time that the level of data In the buffer reaches a pre-determlned level. The 
rate at which data is written to. and read from tape is. therefore, completely independent of 
user data transfer rate. This severance of the traditional direct *tnk between user data transfer 
rates and tape read/write rates means that a buffered system can also accommodate data which 
Is not continuous (l.e. intermittent or burst data) and be able to operate at any user controlled 
transfer rate (continuously variable) within Its rated range. 

Clearly, the buffered approach would appear to have important adYmtages for computer 
based applications, particularly If the tape drive Is specifically designed for very fast start/stop 
operation - thereby necessitating only a relatively small data buffer. 

An interesting additional benefit, which should not be overlooked, is that buffered ^tems 
do not have to actuaUy be in the normal recording mode (with tape running) in order to capture, 
say. an unexpected transient event. They can wait in standby mode until the event commences 
and then data can be written to tape from the buffer as previously described. This reduces wear 
and tear not only on the recorder itself, but also on heads and media In the case of fixed-head 
systems where nothing is in motion until data is transferred from the buffer on to tape. 

Similarly, when reading data at a low transfer rate, tape motion only occurs as necessary 
to maintain a level of data in the buffer comn>ensurate with the user transfer rate. 


AUXILIARY DATA 

While we have seen that the buffered approach has much to commend It with regard to the 
handling of different (and perhaps variable) input/output transfer rates and computer entry, 
there remains the problem of the consequent loss of relationship between timebase and tape 
motion since, as we have already discussed, the tape only moves when data Is passing between 
tape and buffer. If timing Is already intrinsic in the user's data stream - for example, where the 
Input clock is synchronous with the analogue-t ~ Higital sampling process - only periodic up- 
dates may be necessary In order to keep eveivth *ng .»nder control. Alternatively, more precise 
timing information may be required. Some h rate digital cassette recorders Incorporate an 
Iniemal clock which Is written to a separate jcUla.>y) track in the form of a date/time code. 
This timing Information may subsequently b'* used to support high speed search during replay. 

Another useful method of providing reference Information is by using event markers. On 
some recorders, the controlling computer can write unique event markers along with event ID 
character strings to the auxiliary track. These can be scanned at high speed In order to locate 
selected records and also to provide an event log or directory of all events on a tape. With 
buffered systems, users should expect this information to be recorded in synchronism with Its 
associated user data In order to maintain the necessary precise relationship between the 
location of the event marker and the data to which It refers. 


24 / 



COMBSAND AND CONTROL 


Clearly, significant Improvements over traditional methods of 1 of data recorders 

are needed if systems are to be Integrated successfully into the computer environment. 
Typically, commands and status requests pass between the recorder and controli?'- via a 
conventional communications Interface such as IEEE488 or RS-449. 


DATAFLOW 

The control of data flow in continuously running systems is relatlv'ly simple since It Is 
only necessary to start the tape rurmlng (at the correct speed) and allow data to flow in or out of 
the recorder. With buffered systems, however, the movement of the tape itself is a secondary 
Issue as this process Is automatically controlled by the action of the recorder attempting to 
empty or fill Its buffer. One advantage of a recorder which has been designed with an integral 
buffer is that -i should not be possible to either oveifUl or empty Its buffer during data transfer 
ooerations. 

With continuous Inputs, this may simply mean ensurlrig that the input clock rate does not 
exceed the rated maximum for the recorder. If the inpui is in the form of burst data - biocks of 
finite length with gaps in-between - It is generally permissible to exceed the maxlmuir 
continuous rate for short periods, i the case of such "burst" data, it is advisable to implement 
a "hand-shaking" protocol so that ,..e recorder can control the flow of data within the capacity 
limits of Its buffer. 

On replay, the situation Is slightly dlfTerent since li should be possible for the computer to 
control the transfer of data in accordance wit! ts own needs and activities. Here, a hand- 
shaking protocol Is essential since the mere faci that the computer may have requested data 
does not In every case mean that data will be immediately available. Consider the situation 
wheic a new cassette has been loaded into the transport and placed at the beginning-of-tape but 
no other tape movement has yet taken place. The computer may request data and offer an 
output clock, but the recorder's buffer as yet contains no data. Instead, the recorder will 
acknov'ledge the request for data and immediately star* to move tape in order to fill the buffer. 
At a certain point, there will be sufficient data within the buffer for an output transfer to 
commence. As long as the computer continues to demand data, the recorder will maintain an 
appropriate level of data in Its buffer, starting and stopping he tape as necessary. At some 
point in the transfer process, the computer may decide that It has sufficient data and cease to 
request furt er data. Recognizing this, the recorder will discontinue the reading process 
although some valid data may remain in the buTer ready for transfer later. 

A convenient method of achieving this is to use a common, bidirectional data Input/output 
Interface including hand-shaking lines which control the flow of data to and from the 
recorder. For example, a DATA READY signal may be asserted by the recorder to Indicate that 
it is ready to receive data and a USER DA'TA ENABLE may be asserted by the user to indicate 
that applied data is valid. When reproducing, a DATA READY signal asserted by the recorder 
means that valid data is available, while USER DATA ENABLE is as.*^ cited by *he user to 
indicate that he is ready to accept outgoing data. 


248 



MEDIA 


Our discussion hitherto has deah with the gen::ral Issues involved in Integrating the 
attribute "computer-filendliness~ to data recorders and is basically independent of the choice 
of media. The trend throughout all classes of recording Is towards the use of standard cassettes. 
There Is actually a paradox here since modem epen-reel tapes can contain an enormous 
amount of data and represent the most edlclent method of storage by volume. (Remember that 
every cassette In effect contains an empty reel of suuilar volume to the media Itself.) Open reels 
may not be convenient to load or keep free from contamination and are therefore considered 
"unfrienclly" by some users. Conversely, cassettes arc convenient to load, both manually and 
automatically, and their acceptance is now almost universal. 

Although a full discussion on the range of cassette media is beyond the scope of this 
particular presentation, many equipment designers now elect to use commercially available 
multi-sourced cassettes rather than to develop custom-designed media for reasons of 
economics and availability. 


249 




N98-80469 


TOWARDS A 1000 TRACKS DIGITAL TAFT RECORDER 


J. M. OonteUer. J. P. Casteia. J. roHnfn. J. C. Lditacaa 
*ltiofiiMn CSF» dc Reclmclm 

Domalnede CofbevUle 
OfMy Oudex F-01404 
Prascc 





Thocmoa ConwinirT Electionkit. RM) Ft«ncg 
mUvch F*e7403 
Fiance 



omoDucncNf 

As the demand for hl^ data rate (up to 1 Gb/s), high density (down to 1 pm^/blt) tape 
recorder increases, the main Investigation trend is an Improvement of the wen known heU^ 
scan concept. The drawbacks of this techncdogy are also well known: sophisticated mechanics, 
head to tape ctmtact and wear problems. In our Qxed head approach, the recOTder mechanics is 
made much more simple, but the complexity is turned towards the integrated magnetic 
components, which have to record and reproduce hundreds of tracks in parallel. Our 
multiplexed write inductive head and magneto-optical >eadout head will be described, and the 
global system performances evaluated. 


KECORDING HEAD 

To avoid the impractical number of connections necessary for addressing individually a 
large number of tracks, the heads have been arranged on a matrix array of rows and columns. 
A conventional addressing technique is used to multipkur the recording process. E^ach head is 
located at the crossing of two coils, the row wire being used to feed the data to be recorded, and 
the column wire to select the desired elements. The present multiplexed write component is 
conqxised of 32 (data) x 12 (selection) = 284 heads. 

A planar head technology has been developed for the thin film pole realization. The top 
part of the head is then a flat surface, about 7x3 mm large, designed to record on a S mm MIl 
tape. The bottom part of the head is a mtx of conventional ferrite grooving, coll winding, glass 
fusion and polishlrig. Recorded t'^ck wiath Is 18 pm. 

A two bean interference method, using a monochromatic light has been u5;ed to 
characterize the head lo taoc contact, with about lOnm resolution. The interference pattern 
takes place between the tape and a dummy glass component where the protuberant m^netic 
pole shape of the multitrack head has been reproduced. In this experiment, the tape can be 
static or running. It has been shown that temporal and spatial homogeneous close contact can 
be achieved between a moving magnetic tape and a large active area head. The head to tape 
average spacing is directly correlated to the tape roughness ( about 50 nm. measured by atomic 
force microscopy) and does not vary significantly with the applied pressure (typically around 
2. 104 PA). Taerrfore. th-? head to tape contact is as good as it is for a rotating head. 

Signals recorded with a muitio'.exed head have been compared to signals recorded with a 
state of the art 8 mm MIG single track head. The output/current curves show a similar 
maximum output level *nr both heads. For optimized currents, output/frequency curves arc 
identical. 



READOUT HEAD 


The head to tape speed is very low m our system: 2.6 cm/s. Only an active readout device 
can then be considered. 

A simple transducer has been realized to pickup the magnetic flux from the full width of 
tape: on a GGG substrate. 2 magnetic layers are separated by a ixin-magnelic gap la>%r. The tape 
Is running on the edge of this 3 layer assembly. The magnet tzaaon change in o;ie of the layers, 
due to the recorded tape proximity, is analyzed using the well known Kerr effect. The full 
magneto-optical device is then made of a laser diode, the magnetic sensor, a few leruses or 
mirrors, a polarizer and a linear CCD. The laser spot is focused on the full sensor width, in 
such a way that each track magnetization will be imaged on a different CCD pixel. No laser 
bean deflection is needed. 

The signal over noise ratio of this head is proportional to the laser diode power, the 
magnetic efilciency and the figure of merit of ihe transducer. The use of Sendust for the sensor 
magnetic layers, and the optimization of the layer thicknesses has led to a 4-% magru;tic 
dDclency. The Qgure of merit of the transducer has reached 4 . 10^. With a 50 MW laser diode, a 
good enough 26dB peak to nns. full band signal over noise ratio has been obtained to reproduce 
2CWdb /s on our present demonstrator. The recorded bit length is 0.5pm. 


DIGITAL PROCESSING 

A conventional 8-10 modulation code has been used to adjust the '-hamiel lu the ma^eto- 
optical head characteristics. The output signal has to be equalized and the clock has to be 
recovered for each independent track. li has c ■•one at a reasonable cost by multiplexing all 
tracks in a pipelined architecture. Signal is dig. ;ht at the CCD output. 


SYSTEM PERFORMANCE 

The raw bit error rate measured for the overall ^stem is in the range 10'®. A Reed Solomon 
error correcting has also been implemented, and the system Interfaced with a video codec. A 
digital video demonstration is now settled in our laboratory. 


CONCLUSION 

A new concept of fixed head recording has been demonstrated, wit' state of the art 
performances. The advantages of such a sy'stem over conventional r ug heads are 
numerous. The simple and reduced mechanics invoh'cd will lower the pri the recorder. 
The low head to tape speed decreases iremendcusly head wear and tape dar».jge. For space 
application, the absence of gyroscopic effect due to the high speed rotating drum, the possibility 
of backward readout may be essential. 


252 



N93-80470 


EVOLUTION OF A HIGfi PERFORliANCE STORAGE STSTElf 
BASED ON MAGNETIC TAPE IN3TRUMENTATION RECORDERS 

Brace Peten 
Detetape Ineoiposated 
360 Sierra Madre Villa 
Paoadena. CA 61109 


DfTRODUCnON 

In order to provide transparent access to data la network computing environments, high- 
performance storage systems are gettmg smarter as well as faster. Magnetic tape 
Instrumentation recorders contain an increasing amount of intelligence in the form of 
software and firmware that manages the processes of capturing iixput signals and data, putting 
them on media and then reproducing or playing them back. Such intelligence makes tiiem 
better recorders, ideally suited for applications requiring the high-speed capture and playback 
of large streams of sigrijals or data. 

In order to make recorders better storage systems, intelligence is also being added to provide 
appropriate computer and network interfaces along with services that enable them to 
interoperate vvith host computers or network client and server entitles. Thus, recorders are 
evolving into high-performance storage systems that become an integral part of a shared 
informcUon system. 

Datatape has embarked on a program with the Caltech sponsored Concurrent Supercomputer 
Consortium to develop a smart mass storage system. Working within the framwork of the 
emerging IEEE Mass Storage System Reference Model, wc are building a high-perfoimance 
storage system that works with the STX File Server to provide storage services for the Intel 
Touchstone Delta Supercomputer. Our objective Is to provide the required high storage 
capacity and transfer rate to support grand challenge applications, such as global climate 
modeling. 

REQUIREMENTS 

Reliable, high-performance storage is a basic requirement of emerging network computing 
systems used for analytical problem solving applications. With the advent of mixed media 
data types, including computational digital movies, storage must accommodate bitftles in 
excess of one gigabyte. In order to move these bitftles in and out of storage without boltleneclrs. 
transfer rates must exceed ten megabytes per second. Access time must be predictable within 
reasonable human-interaction parameters, which is normally in seconds or minutes. Access 
must be provided with high data integrity on the order of one error in lOE 12 bits or better. Data 
security must be provided through controlled access. 

The Concurrent Supercomputer Consortium has these requirements. In order to support the n- 
dimenslonal . nonlinear modeling of the grand challenge applications, large bitftles up to 100 
gigabytes and high transfer rates at Hl^Pi speed (up to 50 megabytes per second) are needed. 
Data Integrity must reach the order of one error In 10E15 bits. Standard Mterfaces (e g.. HIPPI) 
and protocols (e.g.. the IPI-3 command set for high-performance virtual d<sk) are needed. 
Compartmentatlon of data, such as storing separate bltflles or classes of bitfties on separate 
removable media and providing controlled access to the media, is an acceptable approach to 
data security. 

Thus, storage must be provided as a subsystem characterized by the full range of systemic 
parameters, such as performance, functionality, security, and reliability, maintainability, 
and availability. 




/ 




253 



€ 


BASIC 8T0RACS SYSTEM 

Initially. Datatape will supply the Consortium with two DCTR-LP400 Digital Cassette Tape 
Recorder/Reproducers (LP400s) capable of sustained transfer rates up to 400 Mbps, two 
Variable Rate Buffers (VRBs) caoable of buffering 384MB of data with burst transfer rates up to 
480 Mbps, and two HIPPI Interface modules. Protocols and commands are being developed to 
manage the control and data paths. 

The LP400 Is a high-performance 19mm magnetic tape recorder that is capable of recording 
and reproducing d'gltal data rates from 50 to 400 Mbps on the small, medium or large 
commercially available D-1 tape cassette. The large tape cassette stores nearly one terabit of 
data. The LP400 handles wideband data via 8- or 16-blt parallel I/O and compiles with the 
ANSl-ID- 1 format with a bit error rate of 1 error In lOElO bits. Each set of four tracks (a track 
set) is addressable by the corresponding track-set tdentlficatlon. recorded in the crxitrol track. 
Local control Is provided via a remotable control panel. Remote command and status 
operation Is provided via IEEE-488 or RS-422 Interfaces. 

The Variable Rate Buffers (VRBs) are used to extend the recorders from being InstrumentatlrHi 
recorders to being computer peripherals. A VRB and an LP400 recorder make up a peripheral 
storage unit. The VRB transfers data to and from the host computer via a HIPPI Interface in 
bursts determined the specific characteristics of the host interface, and It transfers data to 
and from the recorder in the continuous streams that the recorder uses. The VRB features 
automatic rewrite, whereby bad or marginal areas of tape are skipped over. This feature 
enhances data integrity to better than 1 error In 10E12 bits. In addition to HIPPI. the VRB can 
accommodate other host Interfaces such as SCSI. SCSI-II and FDDI. 

The HIPPI interface module has separate data and control Interfaces, supporting peer-to-peer 
data transfers. The HIPPI data interface is fully compatible with the relevant HIPPI standards. 
It is a dual s.mplex configuration: either the receiver or the transmitter can function 
separately, but not both at the same time. The data path Is 32 bits wide and transfers data at a 
rate of JOO Mbps. The HIPPI control Interface is an Eithemet port. The command/status set is 
modeled after the Maximum-Strategy version of the IPI-3 command set for disk arrays. 


STORAGE SYSTEM EXTENSIONS 

In parallel to the development of the wide-bandwidth interface. Datatape Is developing 
additional functionality to Improve and extend the storage systems' performance and 
scalability. For example, third-loop EDAC will be added to the VRB to Improve the corrected 
bit error rate to 1 error in 10E15 bits. 

Robotic library capabilities are being developed to evolve this magnetic tape peripheral storage 
system to a hybrid, hierarchical mass storage system. Our plan is to provide capabilities as a 
Physical Volume Repository or Physical Volume Library, interfacing in the Storage Server to 
support data transfers using either physical or logical file names. A carousel-and-picker 
mr^ule is being designed to handle multiple sizes of cassettes. A feature will be provided to 
enable multiple. Independent accesses to a single carousel, which virtually guarantees access to 
any cassette. It also extends the storage-capacity growth potential of the system. The control 
framework Is being st'^jctured to accorranodate heterogeneous storage drives and media, such 
as magnetic and optical disk, as well as magnetic tap>e. 


254 



FUTURE SMART STORAGE 


The ultimate goal Is a coherent, balanced high-performance storage system that can be 
configured and adapted to specific operational, technical and economic requirements. Such a 
system will meet the basic goals of the IEEE Mass Storage System ReXerence Model: open 
architecture for general purpose storage sys-etns; applicable to distributed systems as weQ as 
centralized and standalone systems: and scalability. In addition, such a system will provide 
services to facilitate the collection, processing, analysis and dlssembiation ^ data. Examples 
Include signal processing, data reduction and enrichment, ctanpresslon and encryption. 


255 



256 


CALTECH SUPERCOMPUTER ARCHITECrURE 


HIPPI 

SWITCH 


INTEL 

TOUCHSTONE 

DELTA 


100 MBps 


/ 


GIGABIT 

NETWORK 




32 Bit Wide 
Data I'ransfer 
at 100 MBps 
(Dual Simplex) 






100 MBpd 


HIPPI 

FRAME 

BUFFER 


STX FILE 
SERVER 
(UNITREE) 


100 MBpd 


DISK ARRAYS 
at 10~*i2 GB C9 
“BUFFER” 


2>3 MBps 


S»0 GB TOTAL STORAGE 


ETHERNET 

SERIAL 

CONTROL 


EXABYTE JB 
4 ea 490 
TAPE DRIVES 


HIPPI 

INTERFACE 

MODULE 

— 


VRB 


DCfR-LP400 








HIPPI 

mmm 





■ INTERFACE 


VRB 


DCTR-LP400 

MODULE 











I 


' 93-*0471 


MASS OPTICAL STORAGE - TAPE (MOST) 

WUilam 8. Oakley 3 ' 2 ^'^ 

SM Gremmradow Way ~ 

San Joee,CA 90134 ^ 

/5 /'// ^ 

BACKGROUND .7 

f ' 

/ 

In today's large mainframe and supercotnputer environment there exists a continuous 
demand for increased performance In digital storage systems. The user need for near-line 
storage capacity is currently doubling every four years. In addition to higher capacity, a 
desire exists for hl^er data transfer rates, and longer term database archlvablllty. at lower, 
and lower, cost. Each component of this quartet of demands appears to be Insatiable. 
Magnetic tape technology presently dominates the digital mass storage markets, but the 
continuous growth of requirements Is drawing attention to the limitations of the technology 
as an archival mass stor^e medium. The lowest cost option currently available for long term 
data storage is to use magnetic tape, although It Is not well suited to meeting the need for many 
tens of years of reliable storage. Today, the majority of magnetic tape mass storage systems 
are bas^ on the IBM 3480/3490 (or compatible) tape drives. These drives offer only moderate 
transfer rates and relatively small increments of storage, both of which create a logistics 
problem due to the large numbers of cartridges necessary In a typical system and the time 
taken to transfer data. Both higher cartridge capacity and data transfer rates are available In 
,»ort:e helical scan magnetic tape systems; however, these command a sustontiaMy higher 
price, exacerbating the cost problem, and are not compatible with most installed systems or 
tape databases. 

To speed data access a variety of IBM 3480 compatible robotic tape cartridge servers have been 
Implemented with capacities up to 6.000 cartridges. These provide more acceptable access 
times, but individual cartridges continue to provide only small capacity Increments. 
Although ihis Is the Industry preferred solution o. present. It results In massive, expensive 
robotics, with slow access to only a few terabytes of storage, and does not address the needs of 
very long term archlvablllty or higher datr transfei' rates. 

A surge o. .erest In optical storage has recently been created by the advent of optical discs. 
These have been seen as a potential solution to the storage capacity problem and few large 
systems have been implemented which employ up to 14 Inch diameter discs storing several 
gigabytes of data per side. At present, the data transfer rates of existing drives is very low. 
typically under a megabyte per second, and their cost remains high. Regardless of this, optical 
disc techtiology has found some interest In the storage community, and sales of optical disc 
systems are expected to reach over $2 Billion by 1996. Both write once and erasable 
technologies are available, with most current Interest going to the smaller erasable magneto- 
optic systems. 

For larger systems, the few gigabytes of storage offered by a single optical disc Is much too low. 
and robotic mechanical "Jukeboxes" have been Implemented containing hundreds or perhaps 
thousands of discs. These systems are generally also Inadequate, offering only a temporary 
and Incomplete solution, due to limited data transfer rates, slow disc access, large physical 
size, and substantial cost. Reliability and maintainability problems also exist due to tne 
mechanical nalure of the approach and the need for many disc Interchanges. A much better 
solution Is required for large system mass storage. 

It is inevitable that the tape storage market will soon see the introduction of very competitive 
products based on Digital Optical Tape (OOT^“) technology. This will be particularly true in 
those market segments requiring large databases, due to the marked'y superior archival 
properties and storage capacities of optical tape. Several manufacturers are introducing 
various types of write once (WORM) archival optical tape media. Little interest appears to 


257 



exist at this time for erasable tape, although both magneto- optic and dye polymer erasable 
tape have been demonstrated . This lack of interest In erasable tape stems from the current 
industry orientation toward archival mass storage systems, and could change In the future If 
digital TV video recorders adopt optical tape technolc^. In the near term, the emergence of 
optical tape products, with their prlce/peribmiancc benefits over magnetic systems, will 
probably ^atly expand both the capabilities, and the market volume, of high end tape based 
systems. 

baseiTape Systems Incorporated has been rescai.'hliig a new digital storage product based on 
laser writing onto optical tape. This Digital Optical Tape System (known as DOTS'*^). Is 
targeted at the large computer mass storage marlut. The DOT*^ system utilizes the IBM 3480 
removable cartridge, which contains about 160 meters of one mil (0.001 Inches) thick, half 
Inch wide tape. This length of tape can provide up to 50 GigaBytes of user data when allowarKe 
is made for error correction (ECC). track spacing, headers, etc. Harf mil thick tape is also 
available, and provides nominally 100 GigaBytes of user data per cartridge. The initial 
systems use laser diodes of 830 nanometer wavelength as the write/read source, and provide a 
one micron recorded bit size. Thus the optical tape areal storage density is the s?nie as for 
optical discs. 

The Input/o tput (I/O) data transfer rates of the LaserTape drive can be between 6 and 15 
Mega^rtes per second, depending on the particular optical tape media used. Several different 
types of optical tape media exist both from U.S. a.~^d foreign sources, and all are compatible 
with the LaserTape system. The archival lifetime of all media types is expected to 
substantially exceed 25 years, and tape vendors are working towards establlshu.fi 100 year 

archlvablllty. Bit error rates, after correction, are expected to be better than 10’ for the 
LaserTape drive. 

Rapid Increases in both cartridge capacity and system data rate are anticipated in the future. 
Snort wavelength lasers operating in the green, blue, and U.V. r^lons of Uie .spectrum are In 
development, and these sources will enable approximately an order of magnitude increase In 
both tape areal storage density and data transfer rate. By 1995 the DOTS™ technolcgy should 
be capable of storing about half a Terabyte in a single 3480 cartridge, with I/O data transfer 
rates of over 100 Megabytes per second. 


SYSTEM CONFIGURATION CONSIDERATIONS 

Today's large systems are configured using three level memory systems. The central 
processing unit (CPU) interacts with local fas random access (RAM) primary memory, which 
itself Interchanges data with on-line secondary (disc) storage. The disc storage Is loaded on 
demand with the desired files from tertiary storage, which Is invariably either off-line 
operator or robotically accessed tape. Future system architectures wUl probably retain this 
basic helrarchy with changes occurlng at each functional level as storage technology 
advances. CPU's are migrating to multiprocessor systems, such as Thinking Machines Corp.'s 
new CM-5. a massively parallel computer which uses up to 2,000 of Sun Microsyste.ms SPARC 
microprocessors. To support these and similar systems RAM is becoming larger and faster. 
Distributed systems are rapidly becoming the order of the day as networlis effectively become 
the system backplane. 

Today's rotating disc systems will ,«omeday be replaced by larger capacity, much faster 
systems using different technology such as. perhaps, optical holographic storage. This type of 
system is already In development as evidenced by the Holostore system being developed by 
MCC In Austin. Texas. This approach potentially offers several gigabytes of storage with 
mlv osecond access times. The Holostore technology' uses volume holographic storage In 
optical crystals. It is a page oriented device that writes and reads data in a two dlmen.slonal 
optical form using a laser source. The system is physically small, has no moving parts, and is 
a parallel access device capable of very high transfer rates. 


258 



Development of high data rate mass storage systems are not. of course, dependent on the 
success of the Holostore technology. Magnetic disc systems currently available substantially 
meet the needs of such a system, except for some I/O rate nd latency limitations In the RAM 
to disc interface. Current wisdom has It that each MegaFlop (a Million Floating point 
arithmetic operations per second) of processing power requires about five megabytes/second 
of 1 fO. This means that today's 2 M^aFlop (about 10 MIPi RISC mlcropiocessors require I/O 
rates of 10 Megab^.les per second from RAM. It follows that the Input data rates fixm either 
the Holostore. or from disc to RAM. should be similar. EVen given a certain amount of reuse 
of data in RAMI or disc for a particular computation, the data I/O rates required from tertiary 
storage are not significantly lower. This is particularly true if the average Tile size 
substantially approaches the secondary storage capacity, requiring frequent file trmisfers. 

For supercomputer and raalnfnane systems, the CPU rates are in the hundreds of MegaF lops, 
and file sizes are often In many hundreds of megabytes. This results In a need for tertiary 
storage I/O rates In the rat)ge ^ a hundred megabytes per second. Access time to a required 
data set Is also a factor of considerable importance. To support this compute intensive 
environment, secondary storage using either discs, or a Holostore ^.lem will be requited with 
a storage capacity of one or two gigab^es. Downloading bltfile data sets to such a system will 
require frequent transfers of hundred megabyte size files. File transfers of this size can be 
required every few tens of seconds. l.e. on a continuous basis . To support systems of this 
nature, a tertiary storage ^tem of hundreds, or perhaps thousands of glgab}rtes is required, 
with continuous transfer rates In the range of a hundre l megabytes per second, and access 
times of about ten seconds. 

One possible means of addressing this need is the use of disc arrays. The number of 
Independent disc drives is determined by either the cumulative size of the memory desired, or 
the cumulative I/O rate desired. Data transfer rates of magnetic disc drives permit high 
S 3 rstem data rates with only a few drives. However, a multi terabyte memory requires so mar^ 
disc drives as to be impractical. Disc arrays onfy offer modest memory sizes therefore do not 
fit the mass storage need. 

The Introduction of optical tape drives with a user capacity of a hundred gigabytes per 34^ 
cartridge and with a data transfer rate of 15 MBytes/second potentially provides a better 
solution. A system comprising only a single tape drive, with an autoloader containing ten 100 
Gigabyte tapes, provides access to a terabyte of data in Just a few tens of seconds. This is not a 
vision for the far distant future. The 3480 cartridge autoloaders, the basic tape moving 
system, and the opdcal head fabrication technolc^ already exist. All that remains is to 
combine the available tecbnol''gy assets into an optical tape drive. 


THE TECHNOLOGY 

The basic technology to be implemented In producing a high performance tape drive Is mostly 
a combination of two standard technologies. The tape drive mechanism is essentially a 
standard IBM 3480 compatible magnetic system modified to use optical tape. Virtual*'' all of 
the standard tape control electronics. Including the tape velocity servo, is utilized, and 
virtually aU of the tape movement mechanics Is preserved. The standard magnetic head is. of 
course, removed and replaced by an optical write/read head. The optica! head mostly us ; 
technology which has been proven in the optical disc industry. Standard, although improved, 
optical focus and tracking techniques are used to maintain track rollowlng and beam focus on 
the moring tape. The basic system read/wrltc scheme Is shown In Figure 1. 

The linear stop/start tape system of the IBM 3480 compatible is fully preserved and the 
system operates at 3 meters per second, between the standaid tape speeds of 2 and 4 meters per 
second. The high data rate is achieved by optically writing a transverse column array of one 
micron diameter optical bits, with all bits in the array being written simultaneously. Having 
written a column array of perhaps 48 or 64 bits, the normal tape advance allows r subsequent 
array to be wriuen In less than a microsecond. In this manner data rates of the order of 100 


259 



to 150 Megabits per second are achievable. Figure 2, shows a transverse multlbit column 
located between two servo tracks. Seivo and data bits are written simultaneously. Data 
recording is achieved by individual modulation ol each bit in the array at about a tvo 
raegah'^rtz rate. As all Lransverse bits in an array are recorded slmultan<’''usty, an array of 64 
data bits, for example, would give a data rate of about 128 Megabits per second. 

With a 1.5 micron longitudinal (down tape) spacing, a 3 meter per second tape speed gives a 2 
MHz bit rate for each bit in the transverse array. A bit center to center transverse spacing of 
1.7 microns is used and arrays of 32 and 80 bits per track are planned, corresponding to user 
data rates of 6 and 15 Megabytes per second. For the 80 bits wide colunm the written swath 
width is about 0. 15 millimeters, which permits up to approximately 80 separate swaths to be 
written acriss the 12.5 mm tape width, nils allows about 6.400 bits to be stored across the 
tape width, and is the primary reason the system has such a high storage capacity per 
cartridge. 

nie multi track format implemented permits quasi random access to data in that it is not 
necessary to read the entire tape in a sequential manner. A transverse motion of ihe optical 
head assembty allows each of the 80 individual tracks to be directly accessed thus providing 
some aspect of parallel access. E^ch of the approximately 80 tracks c >ntains 1.5 gigab)des of 
data and is individually identified by coding within the track format, as is the track position 
along the tape. Ey this means rapid access to any known file location can be achieved. The 
location of data within a tape is ideritifled by placing both swath number identity and 'down 
tape' position data in the servo tracks. This allows the sy> • t-"* continually validate its 

location on the tape. 

Tne average access time to data on any tape, once loaded. *s ... of the end to end 

tape time of 1 10 seconds. For a 100 gigabyte capacity -.ar ^ this pirv, j . erage access time 

of about 37 seconds. If an autoloader of ten cartridges wofe to be .».>■ '<i. tn.. time to exchange 
cartridges will be about 8 seconds, thus prc viding an avt^rage ac.. ; time of nomin ' 45 
seconds to any data. Use of shorter tape lerigths in a cartri:''»c cat, jviously reduce average 
access times within a cartridge. 


BIT ARRAY GENERATION 

The basic technology of recording onto optical tape has been successfully demonstrated by 
LaseiTape and the key system feature is the means of bit array generation. A variety of 
methods can be implemented to generate the desired array of modulatable diffraction limited 
bits. One approach fabricated and tested at Las ^ape was based on acousto-optic 

multifrequency diffraction^'^, in this technique a number of rr.dlo frequency acoustic wav^s 
are input to an optically transmissive crystal by means of a piezoelectric transducer, and 
form a corresponding set of travelling optical diffraction gratings in the crystal. When 
illuminated by a coherent optical source, each RF frequency and the resulting diffraction 
grating in the crystal forms an optical beam at a specfflc diffraction angle, aiid thereby a 
corresponding spot position in the tape plane. Blnaiy modulation of tne input RF driving 
voltage intensity modulates the optical spot, resulting in data recording on the tape. 

This technique was Unplemented in a system fabricated and tested at LaseiTape in mid 1991. 
and succes.sfuliy demonstrated writing and reading to and from optical tape at data rates 
equlva’ent to 6 Megabytes per second. Only 8 of the designs 22 frequency channels were 
electronically supported, but the system validated the technology of writing/reading to/from 
rapidly moving optical tape with alffractlon limited spots sizes. A limitation cf the system 
was the complexity resulting from implementing the digital modulation and multiple bea-n 
generation In the same device. As desciibed in reference 2, several cross modulation effects 
occur due to using the acousto-optic device for the dual purposes of modulation and beam 
steering. Techniques were designed which mostly compensated for these effects at the cost of 
increased electronic complexity. However the basic param: lc»^ of acousto-optics and the 


260 



steering. Tec^^nlques were designed which trostly compensated for these effects at the cost of 
increased electronic complexity. However the basic parameters of acousto-optics and the 
complexities arising from the cross modulatlcn limit the usefulness of acousto-optic systems 
to user data rates below C Megabytes per second. An innercntiy better approach is one In 
which the multiple optical beam gcneratlcn ano beam modulation occur in separate devices. 
Designs of this nature aix- now being pursued, based on the numbers and modulation rates 
described above. 


TECHNOLOGY GROWTH OPTIONS 

The initial product planned by LaseiTape provide^ up to 100 Gigabytes of user storage with 
data transfer rates f.f to 15 Megabytes per second. This system is based on available laser 
diodes operating at a v/avel«,; .gth of 6.83 microns. Use ol a shorter wavelength la~jer source 
enables a proportionately smaller written spot size, which In turn provides greater storage 
densities and possJbfy also higher data rates. 

Gae proposed future implementation employs a frequenc\ doubled Neodymium crystal laser 
outputiiig in excess of 100 mll’iwatiS at a wavelength of 0.53 microns (green). The cun'cnt 
state of the art in t'.is >.echnology Is 140 miUlwatts cw. with rapid power increases anticipated 
In the near future. The amount of optical oower available will determine the data transfer 
rate for a given optical sj'stem and tape media scnsuivity. For the most sensitive of the 
available media, a nominal data transfer r»<e oi 100 .Megabytes per second should be achieved 
with 120 milliwatts of average Input a -^owex. aUowlng for system transmlsslcn factors. 

This 800 megabit per srcond Iransft latches the needs the emerg'ng fibre charmel 

data nets as plann^ ; der the U.S. Govcrnmeiit HPCC program. 

The -:se of the shorter wavelength will allow recording >f 0.5 micron spots with 0.64 micron 
spacing. For a ‘^ysteni using a tape velocity of 4.0 meters per second, a bit spacing of 0.64 
microns implies a data rate cf 6.2"^ Megabits per second for r. single writing point. A data rale 
of 100 Megabytes (800 Megabits) p^-r second therefore requires that 128 column array data hits 
be written simultaneously across each swath, gr/ing a swath width of about 80 nucrons. Tills 
In turn will permit a storage capacity of up to 500 glgab>des per canrldgc. 

Increased laser power will enable a greater cumber of bits to be written In parallel wUh 
correspondingly h'gher da’a rales. The present optical systems are estimated to be capable of 
track widths of 200 mlcrona. potentially providing up to 250 megab}'ies per second write rates 
if sufficient laser power were to be available. A source laser power ^1/4 wati would be 
sufficient to provide a data rate of 200 megabytes per second. This data ra*e “•atenes the f”ll 
tiansier ra^e of the IIIPPI data communications protocol. It Is clear that a s'^iv'xraut portion 
of any program to prod.j^'e an ultra high pt-rlomance tape drive should be e.xp/*rdcd In 
optimizing the laser source. 

Further into the future, laser sources In ihe ultra violet region of .he spectrum may be 
anticipated These may provide bit sizes of 0.2 microns with 0.25 micron spacing, allowing 
even mghcr d..ia rates and storage capacities of well o.er a terabyte per cartridge. A bit 
spacing of 0.27 microns corresponds to an areal density of I'^.OOO megabits per square Inch, 
v.'hich may well be achieved during this ders'*e. Eventually, spectrally selective media and 
wrlte/r*ad techniques will potentially increase both of these parameters oy another two or 
tiu'ec orders of magnitude 


261 



CONCLUSIONS 


It can be concluded that the boi)geonin£ demand for mass storage, quick accessibility, and hlgli 
I/O data rates, is probably best met by opllca! tape systems. These are the onfy sj^ems that 
can provu.'e the universal quartet of requirements (capacll.y. I/O rate, archlvabllity. and cost 
effectiveness) at acceptable levels In the foreseeable ^turc. 


1. KorpeL Acousto-opUcs. A Review of Fundamentals. 

IEEE Proceedings. Vol 69 No. 1. Pgs 46 to 53. January 1981. 

2. D. He^ht. MultJfn»)uercy Acoustoptic Dil&actlon. 

lEIEIE . ransactlons on Sonics and Ultrasonics. VOL 1. SU-24. No.l. 
January 1977. 


262 




TAPE 


FIGURE 1 BASIC SYSTEM FUNCTIONS 


260 








DIRECTION TAPE 


WRITE 

ZONE 


REGION OF MANY OTHER 
MULTIBIT TRACKS 


OOOO 0000 OOOO 0000 0000 0000 oooo oooo 


TOP GUIDE TRACK 


OOOOO O OOO OOOOOOOOO 0000000 OOOOOO 00 
OOOOOOOOOO O O OOOOOO OOO OOOOO O OOOOO 

00000 00 0000 0003000000 OOOOOOOOO OOOOO OOOOO 8 BIT VERTICAL 
o OOOOOOOOO ooooooooooo OOO 0 0 OOOOOOOOOO o DATA COLUMNS 

OOOOOOO OOOOOOO 0 O 00 OO OOO OOOOOOOOOOOO 
0 0 000 0 0 000000000 OOOOC'jOOOO OOOOO 


OO OOOOOOOOOOO OOOO 
OOO OO OOOOOOOOOO OO 


(OOOOOOOOO OO 


00000 0 0 0 


00000000 


OOOOO OOOOO OOOOO OOOOO OOOOO OOOOO OOOOO 


BOTTOM GUIDE TRAjCK 


EDGE OF 
TAPE 


flGURE 2 LINEAR RECORDING FORMAT 

(ONE EIGHT err OATA column SHOV.'N) 


26 i 



N93-«0472 


Id OPTICAL DATA STORAGE TAPE- ^ ^ ^ 

AN ARCHIVAL 1CAS& STORAGE MEDIA ^ 

Andrew J. RnJdl^ 
ica Imafedata 
Bnntluun 

Munla ftreeBMCK COll INL U. K. 


L Introduction 

At the 1991 Conference on Mass Storage Systems And Technologies ICI Imagedata presented a 
paper which Introduced ICI Optical Data Storage Tape. This paper placed specific emphasis on 
the media characteristics and initial data was presented which illustrated the archival 
stability of the media. 

paper covers more exhaustive analysis that has been carried out on the chemical stability 
> media. Equally important, it also addresses archive management issues associated 
V. or example, the benefits of reduced rewind requirements to accommodate tape 
v'.:;;a?‘on effects that result from careful tribology control !'• iCI Optical Tape media. 

.iCl Optical Tape media has been designed to meet the most demanding requirements of 
a chtval mass storage. It is envisaged that the volumetric data capacity. long term stability 
and low maintenance characteristics demonstrated in this paper will have major benefits in 
increasing reliability and reducing the costs associated with archtvai storage of large data 
volumes. 




2. Suxomaxy Of Id Optical Tape Media Characteristics 

The general characteristics of ICI Optical Tape media have been discussed in many otner 
conferetKes and presentations (eg. reference 1) and will not be presented In detail here, 'fhe 
features are summarized In the table below and inspection of these Indicates the suitability of 
optical tape for mass storage. The remainder of this paper focuses specifically on the media 
characteristics that relate to tlie reliability of the media for archival applications. 


Table 1. Cost/ Perfonnance Features of ICI Optica. 


Low on-line cost 
Low media cost 
Rapid sooess 
High data rate 
Volumetric efficiency 
Indelible media 
Unlimited lesd 
Long media life 


10</MB - 40^/MU dependlTig on format 
0.5</MB -K/MB falling with time 
2GB - 2(X3B per sec. dependui^ on format 
>3MB/sec 

Factor 10 higher than advanced helical magnetic 

>40.000 rewind cycles 
> 30 years 


265 



3. Archive Life -Anji^7*^Ctflf9dla Stability 


3.1 K i te mle d Batadle TMtinf 

Previous published data on ICI Optical Tape has discussed results from accelerated ageing 
performed at the BatteDe Institute Ir the Battelie Class II test (reference 2). This historical da^a 
has cleatty demonstrated media llfe«.'mes in excess of IS years for the product. More recent 
evaluation has extended the period of testing and lifetimes In ejmess of 30 years are now 
predicted. The analysis is discussed in detail b^low. 


3.i.l Testing Regime 

The accelerated ageing test was cam*Kl out on full length reels of ICI 1012 Optical Tape 
packaged in a glass flanged reel. 

Prior to the test blank axxl written areas of the tape were characterized with a map of BER using 
the Creo 1003 Optical Thpe Recorder (OTR). This was done on 3 metre long sections of the tape at 
three points along the tape length corresponding to irmer diameter, mid - diameter, and outer 
wraps of the wound flanged tape. 

The tape was aged in an environment of mixed corrosive gases for a period of 60 days as defined 
In the Battelie Class n test. Previous work by Battelie Institute has generated a correlation 
factor wh'~h shows this to be equivalent to 30 years in a "typical” office envlronmcTit. 

FoUowing ccelerated ageing the data at each section of the tape was then re-read on the OTR 
and the Bi :< compared with initial maps from the unaged sample. In addltlou the blank areas 
were re-mr pped. and data was then written In these areas and the BER determined. 


3.1.2 T Results 

The BER maps obtained are gh*en in figure 1. For simplicity only data from the outer wrap 
section d* the tape Is shown. Our previous experience from the Battelie test Indicates that this 
is where degradation is most rapid and this data represents, therefore, the worst case. In these 
maps the BER measured for each record is plotted against po; tlon along the 3 metre sample 
section. 

In summary, inspection of the data shows that the BER of written data does not increase 
significantly on ageing. A small increase is observed consistently down the length of the 
sample. The cause of this has not yet been identified and will be further investigated At all 
points of the sample, however, the data remains fully correctable and well within the limits of 
(raw BER of fix 10-4 } . The blaiA regions of the tape Indicate a detectable Increase In BER 
during ageing. However data subsequently written In these areas Is also fully correctable. 

b eummaiy. this result indicate., that Id Optical T^>e. both blank and with written data has s 
Ufehme well in excess of 30 years 


266 









3 ^ AnbenirnTtsta 


3.2.1 Test Method 

Previous^ reported Arrhenius testing of 'Cl Optical Tape media has used tire measured change 
In reflecUvtty and CNR to obtain a illetlme prediction in excess of 300 years at 20°C (68°F) and 
60% RH (reference 3}. More detailed testing reported here has used a measured degradation in 
BER as the definition of failure. This is believed to be a more sensitive test of media 
deterioration. 

One dlfllculty in carrying out accelerated ageing on tape samples is that exposure to el evated 
temperature and humidity required for rapid test results can warp, and ultimately embrittle, 
the polyester base film to an extent that the media car no longer be wound onto the optical tape 
recorder for read/wiite testing. To overcome this experimental problem, by Inspection we have 
found that the major cause of failure In our media sAer rapid ageing is the corrosive growth of 
pinholes In the aUoy reflector layer. Consequent^, a microscope Image analysis technique 
was developed in order to quantify the growth of defects In the ndlector. This technique was 
not affected mechanical degradation ^ the base On unaged samples a correlation was then 
developed between the pinliole count (quantified by area fraction of the inspected sample) and 
corrected BETl as measured on the Creo 1003 Optical Tape recorder. From this correlation a 
pinhole count "failure point” could be defined. This was equal to an area fraction of 1 x 
This point was then us^ to define the failure of aged samples inspected via image analysis. 

The accelerated agi ing was carried out by exposing short strips of tape media to a range of 
temperatures (95“C (203“^). 90”C(194°F). 80®C(176°F1) at a fixed RH of 70% The tape samples 
contained 2GB of written data split equally at either end ol the sample in order to assess any 
difference between ageing of written and unwritten areas. For each temperature condition the 
sample was removed from the chambers and Inspected by microscopic image analysis every 3 
dstys. The pinhole count at ten points on the strip was taken Including areas of written data. 
This was done avoiding areas of the sample obviously affected by handling damage. The 
average of these ten readings was used as measure of the sample d^radation. 


3.2.2 The Results 

Typical data obtained is given in figure 2. This Is data from exposure at 90°C (194®F). 70%RH. 
It can be seen that failure occurs catastrophically after exposure for an extended period 
allowing ready definition of the time at which the failure point was exceeded • In this case 42 
days. Media tested at other temperatures gave plots of a similar characteristic (for simplicity 
data plots not shown). 

From these results the failure data is plotted In figure 3 in accordance with analysis via 
Arrhenius kinetics. 

Inspection of this graph allows estimation of an activation energy associated with the failure 
mechanic. This is calculated .as 1.36 eV. 

Although the data is not presented explicitly in this report it was noted that comparison of 
pinhole growth between written and unwritten areas of the sample revealed no significant 
difference. This Is entirely consistent with the results from Battelle testing which also 
indicate written and unwritten media ages at same rate and via the same mechardsms. 


268 






The activation enei^jy of 1.36 eV compares veiy favourably with 1.5 eV quoted by Sot^ for their 
rigid optical disc "Century Media". Based on this activation energy media lifetimes o^ 500 years 
can be predicted for tapes stored under controlled conditions of 18*'C (65**F)and 70%RH. Taking 
Into account the large en*ors associated with calculation of activation energies and 
extrapolation of such pi^lctlons It Is. perhaps, more reasonable to conclude that the dhta fnlty 
eu pport s lifetime dalms !u excem of 100 yean. 


4. ICI Optical Tape - Predicted Rewind Requirements 

The data in section 3 addresses issues of chemical stability In the media structure. Prom this 
we can conclude that the media is Intrinsically very robust In both written and unwritten 
forms and we can predict extreme^ long lifetime for the ICI OpUcal Tape ’product. 

Equalty Important to the lifetime of the data, however, are tribological effects which, if not 
properly controlled, can cause damage to tape media in archive due to pack distortion, and In 
the worst case creasing and cinching of the tape reel destrpyu;g Its functionality arA the data 
stored within It. This Is a very well known phenomena to archMsts of magnetic media and 
can result In high costs associated with good archive management. 

This section of the paper addresses analysis and modelling work carried out within ICI 
Imagedata which builds on the magnetic media experience and illustrates how the tribology of 
our media has been designed to overcome these detrimental effects. 


4.1 Background 

The purpose of the programme Is to assess the length of time over which tapes may be safely 
stored prior to suffering distortion caused by tension relaxation. It is well knovm that over 
time tension relaxes due to the phenomenon of creep. As the tension relaxes, so the interlayer 
pressure decreases which In turn Impairs the ability of a tape reel to withstand the stresses 
associated with normal drive operation. In practice this would be evident as longitudinal 
interlayer slippage during acceleration or deceleration. 

Other failure mechanisms may also occur, particular^ In regions where the cumulative effects 
of Interlayer pressure set up tangential compressive stresses within the tape. In such regions 
layers of t^ipe may actually separate to form voids (cinching}. All forms of loss of pack Int^rlty 
are to be avoided as the end result is likely to be localized degradation at best and complete less 
of data at worst. 


4.2 Test BSethod 

Dmp to the long term nature of the effects described above It Is necessary to use predictive 
techniques to assess and develop media characteristics. Qace the product is defined It is then 
possible to commit to a lengthy assessment for true archival performance. 

The predictive technique which we have adopted L« based on accelerating the rate of creep in 
order to cause failure. An independent measure of true creep rate can then be uoed to estimate 
the time which wou'd have been taken under normal storage conditions. Tht' ?s the approach 
developed by Eschel and Bertram (reference 4) In their study of the archival properties of 
magnetic media. The results of this wor’ cire widely recognized as provl • the best guidellries 
for maintaining magnetic tapes In long term storage. 


270 



The procedure Is as follows: - 


I) A long tenn measure of the true rate of creep of the nxdla must be made. This requires a 
high resolution measurement of tape extension (a few mlcron£ per week) which must be 
isolated from variations In temperature and humidity for the duration of the test. 

II) The prediction of lifetime to faUure requires not only creep data, but also a measure of 
the relaxation m interlayer pressure from initial winding to the point of failure. This Is 
best estimated by accelerating the rate of creep and determining the change In pressure 
corresponding to the point at which the reel Is no longer able to withstand normal 
handling on the drive. In practice this is assessed by braking a spinning reel at 
aggressive decelerations (580 rads/s^ compared with 20rads/s^ on the drive), any 
indication of interlayer slip (illustrated by a chalk line aljng the radius) Is taken as the 
failure point. 

The method of measuring interlayer pressure Is also that discussed by Eschel and 
Bertram. Thin (2.5Mm, 0.001") staii^ss steel tabs are Inserted into the reel during initial 
wiitdlng. The force required to pull the tabs from the wound reel, together with the 
measured friction between tape and tab. provide an estimate of the interlayer pressure. 

III) The measures of percentage relaxation to failure and true creep rate are used to predict 

the lifetime of a tape In storage The estimation relies on an extrapolated fit for the 
creep extension oi the substrate. Based on creep data generated by Bogy et al (reference 5) 
the total strain can be shown to behave with time as p,schel and Bertram use 

this empirical approximation to derive an expression for the predicted time to failure, 
Elquatlon 1: 


Where: 


^Jdilure ^ 1 

Is the estimated time to failure. 

E is Young's Modulus (3531 N/mm^, 5*1()5 psi). 

%P is the percentage pressure relaxation at failure and 
Acj'a is creep strain pc: unit stress at time ti. 



4J2 Results 

4.2.1 Creep Rate 

The rate of creep of fourteen samples of Optical Tape media has been measured over a period of 
208 days. The tapes have extended at .•'n average creep strain per unit load of 5.0*10'^ /psl. 

These figures can be compared with on of 2.4 *10'^/psl and a maximum of 6.5*10‘^/psi 

for various polyester substrate.^ measured under similat conditions in Reference 5. 

4.2.2 Pressure Relaxation 

Accelerated creep tests have been carried cut by maintainin'" the reels at an elevated 
temperature of 45°C (1 13’’F)/50%RH. Tapes are periodically removed, conditioned to ambient 
and tested for slippage during deceleration at 580rads/s^ 

For the f' M .eel, failure occurred after the pressure had relaxed by 80%-85%. This level of 
reduc*(on is much greater than the 50% level predicted in the literature for magnetic media. 
The interlayer pressure al failure is re: arkably low given the large inertias experienced by the 


271 



full length reeb. It Is clear that the tribology developmenta of ICI Optical T^pe media idtlch 
were Initially directed to glee give excellent tape handling and wear, and to allow eaay 
transportation will also gl«e good pack integrity at low lntcrla 3 rer pressures that can develop 
In archive. 


4.2.3 Predicted Rewind Interval 

Extrapolations for a full length reel, based on equation 1. are shown In Table 2 below. The 
creep figures are based on average creep plus wo standard deviations. The result for the tapes 
studied In Ref 4 are shown for comparath . purposes. Estimates are given for percentage 
relaxations of 50% and 80%. Clearty, being able to maintain reel integrity at low Interlayer 
pr essures hM a dramatic elEect on extending the rewind interval 

The predicted rewind Interval of 39 yrs should not be taken literally but rather as an 
Indication of the relative advantage of pack integrity at low interlayer pressures. 


Table 2. Estimated Rewind Interval 



Creep Stram 
per Unit 
Stress (/psl) 

Young’s 

Modulus 

(psi) 

t fzihire • 

(yrs) 

FuU Reel 50% 

6.5* 1C ^ 

5*10 

5 

3.1 

Full Reel 80% 

e.5 * 10 

5*10 

5 

39 

Ampex 50% (Ref 4) 

7 

5*10 

5*10 

5 

3.5 


Given the many assumptions and aggressive levels of deceleration which have been used to 
derive the storage lifetime we believe that further work is required to establish reliably an 
upper limit. Based on the results generated so far It Is nevertheless possible to say that a period 
of 5 years represents a safe, conservative interval for tapes stored In a well maintained 
archive. It Is fully expected that further analysis wlli extend this prediction to 10 years or 
more. Since creep Is very sensitive to temperature, there will be severe Implications where 
storage Is under elevated temperature conditions. 


4.3 Rewind Period in Archive • Conclusions 

The conclusion from this work is that rewind intervals of 5 years can be safely t ssumed for ICI 
Optical Tape full length reels w.nerr storage condiiions are maintained at 18®C (65°F). 50%RH. 
This Is currently believed to be a vcr> conservative analysis. However further work is required 
for more accurate predictions. 

This length of time, together iwlth the ease and speed of r etenslonlng. represents a relatively 
low level of maintenance. The critical factors In deiennlnlng this are the rate at which the 
media creeps and the robustness of the pack at low interlayer pressures which Is due to 
carefully controlled surface chemistry between ov._'-co< id backcoat layers in the wound 
pack specific to the ICI media structure. 


272 




5. Summary 

Data presented in this paper shows that the chemistry and tribology of ICI Optical Tape media 
has been carefulty desired to create a media ideally suited fur the requirements of a low cost, 
low maintenance, high reliability data arcnK'e. In summary, using industry standard tests the 
following characteristics have been demonstrated: 

I. A media structure stable well In excess of 100 years under ideal storage conditions 

II. A media lifetime In excess of 30 years In the presence of corrosive gases, typical of 
the standard office environment. (testing will continue In order to Identify the 
failure point) 

III. Tape rewind periods that are well in excess of magnetic med' . requirements, 
allowing for reduced archive management costs. 

Combined with the unsurpassed volumetric capacity and low cost that can be achieved with 
optlcc^i tape, we believe these archival performance characteiistlcs make it an ' medium 
for mai^ mass storage applications. 


6. References 

1. J. F.DuOy. OPTICAL DATA STORAGE TAPE: A NEW HIGH DENSITY DATA STORAGE AND 
ARCHfv'E MEDIUM. Presented atTHIC. 1/2 October 1991. 

2. W. H. Abb HE DEVELOPMENT AND PERFORMANCE CHARACTERISTICS OF MDCED 
FLOWIN' GAS TEST ENVIRONMENT. IEEE Trans on Components. Hybrids and 
Manufacturing Technology, li pp22-3S (1988) 

3. R. A ‘'clean. J. F. Duffy. Cl Optical Data Storage Tape. NASA Conf. On Mass Storage 
Syst mis and Technologies. 1991 

4. N. Bertram and A. Eshel. FIECORDING MEDIA ARCHIVAL ATTRIBUTES. Final Repon to 
RADC. TR-80- 123. April. 1980. 

5. D. B. Bogy. N. Bugdyaci and F. E. Talke. EXPERIMENTAL DETERMINATION OF CREEP 
FUNCnOi. ^ORTHIN ORTHOTROPIC POLYMER FILMS. IBM J. Res. Develop.. 23. pp. 450- 
458 (1979). 


273 




N93-30473 


Flexft>le Storage Medium For Write'Onoe Optical Tape 


Andrew J. G. 8tnn4{o*d> Sterea P. Webb. DooAld J. Peiettie, and Robert A. Ciprisno 


Tbe Dow Chemical Company 
Central Raeeeich 
X702Bolldln< 
Midland. lOcUCiui 48674 


Abatract 






Id 


A write -once data storage media has been developed ^ich is suitable for optical tape applications. 
The media Is manufactured using a continuous film process to deposit a ternary alloy of tin. 
bismuth, and copper. This laser sensitive layer is sputter deposited onto commeTcial plastic web 
as a p^ngle-layer thin film. A second layer Is sequentially deputed on top of the alloy to enhance 
the media performance and act as an abrasion resistant hard overcoat. 

The media was observed to have laser write sensitivities of less than 2.0 njoules/blt. carrler-to- 
noise levels of greater than SOdB's. inodulation depths of -100%. read-margins of greater than 35. 
uniform grain sizes of less than 200 Angstroms, and a media lifetime that exceeds 10 years. 

Prototype tape media was produced for use in the CREO drive system. The active and overcoat 
materials are first sputter deposited onto three mil PET film In a single pass through the vacuum 
coating system, and then converted down into multiple reels of 35mm x 880m tape. One mil PET 
film was also coated in this manner and then slit and packaged into 3480 tape cartridges. 


1. Introdaction 

Optical data stonige is quickly becoming a 
viable and often preferred option to magnetic 
storage. The promise of high data densities and 
archival stability has initiated the 
development of thln-fllm optical medlas which 
can be substituted into applications where 
magnetic technology is Inadequate. For 
example, optical tape offers storage densities of 
ever one terab>1e on a single 12 inch diameter 
reel of 3mll film (35mm x 880m). This is 
equivalent to -5.000 reels of standard magnetic 
tape^. CREO Electronics Corporation^ ha 
developed a commercial drive for optical tape 
and is currently using tha wit.e-once media 
developed by ICI (Digital Paper)3'4 for 
dr/dopmental evaluation. Las^iTape Systems, 
and oihers. are in ‘he process of developing new 
drive technology for the use of cpfical tape 
packaged in 3480-type cartridges (1/2 " x 570fl). 
This would offer multi-gigabyte storage 
capacities ic- the broader consumer market. 


Years of developmental work at The Dow 
Chen.ioal Company has produced a flexible 
optical media which can be used in tape 
applications. The media has been shown to 
have naany superior properties with respect to 
the other medlas being developed for optical 
tap« . This paper reports on some of the results 
whicii have been collected recently. 


2. Optical Tape Reqairementa 

't ape Is considered a non-rlgld media and must 
meet several special handling requirements 
'vhlch are not major issues for rigid media. 
L jch as discs and c .lus. lape media must be 
sufflriently flexible to accommodate motion 
around the small hubs and rollers associated 
with tape handling v. ithout degradation of the 
layerpr* structure^. Polyester films which range 
in thlclvness from about 1/2 to 3 mils have been 
shown to have the necessary physical and 
optical properties required by current drive 
technologies. Active coatings which an* sputter 
deposited onto these substrate" musi also be 




275 



sulTtctcntly flexible to withstand the 
mechanical handling associated with winding 
and unwinding of the tape. Coatings which are 
either thick cr rigid are susceptible to cracking 
and delaminating, and are therefore not 
suitable for the production of optical tape. 

The cleanliness and smoothness of the 
s bstrate materials are critical issues which 
can affect performance of the media. 
Misinterpretation of debris and surface non- 
uniformities can lead tc error rates which 
exceed the level which can be handled by the 
correction code of the drive. Blt-error-rates 
(BER) of UD to -10*^ can currently be 
accommodated by the CR£0 drive, but lower 
levels are preferred. These concerns can be 
aiJdressed pre-cleaning the substrate and/or 
subbing the base PET with an organic layer in a 
controlled environment. 

Another critical requirement, which is 
relevant for all storage medias, is that the data 
must remain environmentally stable for long 
periods of time. This is especially true for those 
media which are being targeted for archival 
storage applications. For example, the optical 
tape standard set by CREX) requires a 15 j-ear 
media lifetime in an office environment 
(25”C/50%RH)1. 

Premature aging of many thin film media has 
been shown to occur by several different 
mechanisms: 1) the active layer can degrade by 
such processes as oxidation or phase 
segregation, to make the media less sensitive to 
laser wrltln* and/or increase the bit-error- 
rate; 2, the wiilien data spots can change with 
time to cause fdge deformation or phase 
reversal, thus reducing the playback signals: or 
3) the media could mechanically degrade due to 
wear and abrasion during media handling. The 
first two forms of environmental degradation 
are strongly dependent on the composition and 
stnicture of the active layer within the media, 
though some stabilization can sometimes be 
achieved by overroating this layer with a 
topcoat. Protection from frictional wear, due 
to film contact with itself and the roller 
mechanisms, is best afforded by a hard thln- 
fllm overcoat. However, most write- once 
media that have a hard overcoat In direct 
contact with the active layer are insensitive 
toward laser writing. 


The current laser write sensitivity 
requirements for optical tape are less than 2 
nanqjoules per bit for 35mm tape (CREO) and 
less than 1 njoule per bit for the 3480 tape 
format (Laser Tape). 


3. Dow Optlccl 

The new wriie-once ptical data storage media 
developed ai The Dow Chemical Company 
exceeds these requirements for tape 
applications. The recording layer®* Is a 
ternary metal alloy system containing tin. 
bismuth and copper in a weight percent ratio of 
70/25/5. This layer is pneferably deposited by 
plasma guttering from a cast iai]^t. 

The morphology of the sputtering target is a 
complex distribution of metallic and inter- 
metallic phases which are present in varying 
amounts. Figure 1 is a back-scattering 
electron image of the machined sputtering 
target. The predominant phase is a tin-rich 
matrix with 4-5 wt% incorporated bismuth. 
The bright phase seen in the figure consists of 
large areas of segregated bismuth (<l%Sn) 
which predominates at the grain boundaries of 
the Sn-rlch phase. The needle-like structures 
are the copper containing phases. They consist 
of a core of Cu 3 Sn (e-CuSn) surrounded by 
CuQSns (li-CuSn)^5. 

The relative proportions and distribution of 
these different phases arc a function of the 
temperature history of the sputtering target 
dur^ castiTig and machining. Fast cooling of 
the target results in a fine phase structure, 
while slow cooling allows the phases to grow 
into larger crystalliies. 

The heating behavior of the SnBlCu alloy is 
depicted In the calorimetric trace (DSC) shown 
in Figure 2. The relatively low temperature 
endoiherm ?.t 143°C Is near the SnBl cutcctic 
temperature ^5. The alloy becomes pasty in 
consistency at this point and can be 
characterized as a soft, mobile state which 
contains many of the properties of both a solid 
and a liquid. 

The low temperature mobility associated with 
the SnBlCu alloy (especially with regard to the 
bismuth phase) implies that small amounts of 
heat at the surface of the alloy during 
deposition can have a pronounced influence on 


276 



the compostUon of the final thtn Aim media. If 
the target temperature approaches 150*C. the 
phases have an opportunity to both segr^ate 
and migrate aloiig the thermal gradieits 
within the alloy. Compositional analysis of a 
guttering target after a series of extended lilgh 
power deposiiions. in which the temperature of 
the target surface was well above 143"C. showed 
evidence of this type of elemental migration. 
The Cu containing phases tended to migrate 
from the surface into the bulk, while the Sn 
concentration was found to increase at the 
surface. Films made during coating runs with 
this target were found to be rich In bismuth, 
relative to the expected initial target 
composition. Hy routine process monitoring 
and proper bon ilng of the target to a cooled 
backing plat^ . temperature-dependent 
deposition can to uniformly controlled. 


4. Mamr&ctaiing 

The medium is manufactured using a 
continuous film process where the alloy is 
sputter deposited directly onto thin polymeric 
web. A simpUAed schematic of the web coating 
system is shown In Figure 3. This coaler is 
configured to sputter coat up to three layers at a 
time using three separate DC magnetron 
cathodes. Also located in the system is a pre- 
glow* station for lonL^ gas cleanlrrg of the 
substrate before coaling. Each of these four 
staUons (mini-chambers) is isolated from each 
other In space, thereby producing a local 
environment for the containment of the 
plasma gasses. This allows separate processes 
to oe carried out simultaneous^ at each station 
without cross contamination between the four 
sputtering sources, or alternatively allows for 
the incorporation of multiple targets of one 
material to mcrease the production race. 

All of the critical process parameters are 
continually monitored during the coating 
process to ensure consistent aiid uniform 
coatings. Down web reflectance and 
transmission spectra are collected as the 
primary quality control feedback loop of the 
system. Film properties are easily maintained 
to within 1% of their targeted value for the 
duration of the coating run. 

The equipment described above w'as used to 
refine the coating parameters and to produce 
samples for market testing before moving the 


process over to a much larger, production 
machine. 


5. Lner-Weite Senaitivl^ 

The laser-written data bits, in this wrlte-once 
s>'stem. are In the form of non-refiecUve spots 
on a reUecUve background. The mechanism for 
spot creation was found to be difierem than a 
purely ablative and/or evap<»atlve process. In 
these Aims, the alloy melts and Oows out of the 
laser Irradiated area, thereby opening up a 
dark pit or hole^^l^. The medium is sensitive 
towai^ laser writing over a broad wavelength 
range and will therefore be compatible with 
lower wavelength lasers as that technology 
develops (see Figure 4-S). 

Laser write sensitivity measurements were 
made using several difierent techniques. Figure 
6 shows the relative laser-wrtte sensitivities of 
the SnBiCu films as a function of reflectivity. 
These measurements were made using a static 
test apparatus In which a 10 miUlwatt diode 
laser is focussed onto the surface of the media 
(see Figure 7). To determine the laser-writing 
sensitivity at constant power, the pulse width 
is decreased untA the data spot can no loqger be 
visualized. Digital spots were produced on the 
Dow Media with energy levels of less than 2 
nanojoules per bit. 

Tape drives are presently being developed to use 
media which have reflectivity levels between 
35 and 55%. The laser-write sensitivities 
shown in Figure 6 are observed to be very 
similar for all of the media samples made with 
reflectivity levels between 30 and 60%. This 
invariance makes the SnBlCu media very 
versatile towards fulfilling new media 
requirements which may occur as drive 
technology evolves and can also allow this 
media to be used hi totally new. film-based 
media forruats, l.e. optical floppies, laminated 
cards, and dlscs^. 

Dynamic sensitivity measurements were made 
on an APEX Optical Media test system^®. 
Under normal operating conditions, the APEDC 
system automatically focuses and tracks cn 
compact disc type media. The 3mll films 
produced for optical tape were converted into 
a form which was compatible with U;e APEX 
test system. Spcclflcally. the film was cut Into 
discs which were -4 1/2" OD x 1 1/2" ID and 


277 



then sandwiched between two mirror-nat 
polycarbonate discs f~50 mil thick) to create a 
sample in which the optical path was sitiUlar 
to a compact disc. Additionally, this 3mil lUm 
media does not contain tracking infonnation. 
Therefore, the auto-tracking feature of the 
equipment is unusable and continuous reading 
of data relies on the inherent stability of the 
^iti stand to keep the laser head over the data 
during subsequent revolutions of the dlsc/film 
after writing. Table vibrations., and the like, 
eventually cause the data to drift out of the field 
of view of the read laser beam. The APEiX 
system will. However, consistently focus on the 
mfjriia. as long as the media have been carefully 
sandwiched to form a ''flat'*, wobble free 
surface. 

The Dow Optical Media has been siiccessfully 
evaluated using the APEX test system. Typic^ 
performance e aluations have shown 
modulation depths of ~100% and carrler-to- 
nolse levels of greater than SOdB's at 
10milllwatt/250nsec laser settings. 30 ft/sec 
media speeds, and 30 kHz spectrum analyzer 
bandwidths (see Figures 8-9). 

Threshold values were obtained for the media 
by changing the laser energy and measuring 
both the modulation depth and carrier-to- 
noise level at each new setting (see Figures 10 & 
11). Both curves show a drop off in the 
performance as the laser write energy is 
decreased beyond the 2 nanojoule level. The 
spot shapes are observed to be very uniform in 
size with clearly defined borders (see Figure 12). 

Several reels of the Dow Optical Tape were sent 
to CREO for further read-write e\'aluations. 
They uvere able to write successfully on the 
media using laser pulse widths of less than 
1 ISnsec. An optical image of the laser written 
data is shown in Figure 13. The read margin 
for this media is shown in Figure 14. The width 
of the curve at the bottom indicates what range 
of threshold values (focus, laser pourer. etc.) at 
which the data can be read with low error rates. 
Margin wid^h.*' of 35 and greater are Judged as 
superior ly CREXj. A value ctfbetween 36 and 39 
is estimated for the Dow Media. 


6. Environmental Stability 

Media stability was evaluated by subjecting the 
films to various environments of high 
temperature and high humidity. The goal of 


these accelerated aging experiments is to 
extract a lifetime for the media at room 
temperature and room humidity. Lifetime is 
defined as the time period in which the media 
remains usable relative tc a set of media 
standards. For the SnPiCu films, it was 
a .umed that the media are acceptable as loi^ 
as they remain within 10% of their original 
reflectivity specifications. Therefore, 
determining the time it takes a sample to 
d^.ade to 90% of its original value, as a 
function of water and temperature, will allow 
for the calculation of the lifetime at room 
temperature and humidity (20°C and 50% RH]. 
Experiments run under isobaric (constant 
water) aging conditions have shown that no 
significant temperature effect can be discerned 
between 20“C and 100®C (see Figure 15). Tills 
simplifies the calculation of the lifetime to 
iiiclude onty the effect of water. 

Tlie data plotted in Figure 16 show the smooth 
rrlationship between the environmental water 
concentration and the lifetime of the med'a. 
The open square represents the extrapolated 
lifetime at ambient conditions. Lifetimes of 
greater than 10 years have been predicted lor 
the media. 


7. ftototype Development 

In addition to tape, other types of media have 
been envisioned for the metallized web. Optkal 
cards and discs could be produced easily by 
using an embossed film as the substrate 
(contain'ng pre-formatting and tracking 
information), coating this web with the active 
layer, and then lamlnatmg to a base substrate. 
Schematic representations of these prototypes 
are shown in Figure 17. 


8. Conclusicos 

A write-cnce data storage medium has been 
developed which is suitable for optical tape 
applications. Typical performance values for 
the medium are as follows: 

a) reflectivity levels betv/een 35 and 55%. 

b) laser write .sensitivities of less than 2 
nanojou’es/bit, 

c) modulation depths of -100% ® 250nsec and 
10 mwatt laser settings. 


27 & 



f) read margins of >35. 


d) caiTler-to-noisc levels which aie >50dB's 9 
30kHz bandwidth. 1 MHz/2SCnsec/ 
lOmwatts laser settings, and media Work Is continuing in an ciTcrt to refine the 
translation speeds of 30fps. manufacturing process parameters and qualify 

for Uie CREO drive .lystem. 

ti media lifetimes >10>’ears. and 


1. D. Fountain. Digital Paper'. Byte. McGraw-Hin Inc.. NY. F^ruaiy 1989. 

2. D. Gelbart. "An Optical Tape Recorder Using Linear Scanning". Conference Digest Topical 
Meeting on Optical Data Storage. Vancouver. Canada, pp.34-37. March 5-7. 1990. 

3. A. Ruddlck and J. Duify. "iCI's Optical Tape Offers Flexible Altemattve to Rigid Optical 
Media”. Optical Mentory News. Rothchlld Consultants, pp. 10-13. February 1991. 

4. P. Vogelgesang and J. Hartmann. "Erasable Optical Tape Feasibility Study”. Proc. SPIE 
Optical Data Storage Technology and Applications. Los Angeles, vol. 899. pp. 172-177. 1988. 

5. M. Terao. S. Hongome. K. Shigematsu. Y. Mfyauchl. and M. Nakazawa. "Resistance to 
Oxidation of Te-Se Optical Recording Films”. Proc. SPIE Optical Data Storage. Incline 
IfiDage. voL 382. pp.276-81. 1983. 

& S. Chao. Y. Hua^. Y. Chen, and L.Yan. "Materials for Multiple Stages of Archival Optical 
Recordiiig". Proc. SPIE Optical Storage Technology and Applications. Los Angeles, vol. 899. 
pp.240-43. 1988. 

7. A. Gotoh. S. Nakamlchi. and S. Horigome. SPIE 1989 Technical Digest Series on Optical 
Data Storage, pp.24-27. 1989. 

8. V. Kurfinan and R Gransden. US Patent 4. 1 15.619 (1978). 

9. V. Kurfinan and R Gransden. US Patent 4.21 1,822 (1980). 

10. H. Marton and V. Kurf.nan. US Patent 4.241.129 (1980). 

1 1. V. Kurfinan. US Patent 4.501.208 (1985). 

12. V. Kurfinan. US Patent 4.998.239 (1986). 

13. AaJ.G. Strarx^ord. RL. Yates, and DJ. Perettle. US Patent 4.998.239 (1991). 

14. A.J.G. StraiKiycrd. D J. Perettle, and RL. Yates. US Patent 5.016.240 (1991). 

15. M. Hansen. "Constitution of Binary Alloys". McGraw-Hill. N.Y.. 1958. 

16. H. Haskal. 'Dynamics of Pit Formation in Ablative Optical Recordhig". Proc. SPIE Opt al 
Data Storage, Incline Village, vol. 382. pp.l74-lSl. 19^. 

17. A.J.G. Strandjord. S.P.Webb. D.R Beaman, and S.L.B. Carroll. Thin Film Coatings for 
Flexible Optical Data Storage". Proc. SPIE Optic?' rhln Films III: New Developments. San 
Diego. CA.PP. 127-131. vol. 1323. July 9-11. 1990. 

18. OHMT-300 WORM Test System. Apex Systerns Inc.. Boulder. CO, 80301. 


279 




*osii ih.f iciive 







Wavelength (nm) 


Rgure 4. Plot of reflectivity (upper) and 
transmission (lower) of SnBiCu film. 



Wavelength (nm) 

Figure 5. Plot of reflectivity (upper) and 
transmission (lower) of SnBiCu/overcoat media. 



^Reflectivity 


Figure 6. Laser write threshold of SnBiCu Aims 
as a function of refleaivity. 


Light Camera Laser 



Figure 7. Static laser- write test syste'ii. 


281 






Hgure 8. Digitized oscilloscope trace of the 
playback signal from the APEX test System 



Laser Energy 
(Nanojoules/bit) 


Figure 10. PIm of modulation depth as a function 
of laser-write energy. 


M>E]( TEST SYSTEM i 
oaobM i 

200i)KHZ/ 
aMiriiflir 


M •64.S0W 

i 10 PB/ 



Figure 9. Digitized freque.K y spectrum from the 
APEX test system. 



Figure 11. Plot of carrier -to-noise level as a 
function of laser-write energy. 


282 








Figure 12. Optical photograph of data written on pjguje 13. Read margin data from the CREO tape 

AOCV euct^m . . 


the SnBiCu media itsing the APEX system 


drive system. 



r-h 
i;:l vti/v 

m 







Temp' (K' ) 


Figure 14. Optica! photograph of data written by 
the CkEO tape drive system. 


Figure 15. Environmental stability of the SnBiCu 
Media. Arrhenius plot at constant water level 
(78.6 Tort). 


283 




Ln (10% Dec.\y Time) 


10 


1 1 1 1 1 1 1 1 1 1 1 1 rq 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 



Ln(H20] 


1 or 3mil PET 


Overcoat 
Acti\« Ijiyer 





\ 

25mUPC j 





45roa PC 



2 mil polymer 
Overcoat 
Active Layer 
Smil polymer 


2milPC 
Active Layer 
Overcoat 


Figure 16. Environmental stability of the SnBiCu Figure 17. Schematic representations of new 
media. Humidity dependence. media formats. 


284 




N98-80474 


I^^lBCntON TRAPHNO DATA STORAOB S7STBH AND APFUCATIONS 


D;»_el Brower. Allen Baimin and M. H. ChalBn 
Optex Corporatlan, 2 Beaeardi Court 
Rockville. MD 20650 





The advent of digital information storage and retrieval has led to explosive growth In data 
transmission techniques, data compression alternatives, and the need for high capacity 
random access data storage. Advances in data storage technologies are limiting the utilizatlcn 
of digital^ based systems. New storage technologies will be required which can provide higher 
data capacities and faster transfer rates in a more compact format. Magnetic disk/tape and 
current optical data storage technologies do not provide these higher performance 
requirements for all digital data applications. 



A new technology developed at the Optex Corporation out-performs all other existing data 
storage technologies. The Electron Trapping Optical Memoi, lETTOM' media is capable of 
storing as much as 14 gigabytes of uncompressed data on a single, double-sided Scinch disk 
with a data transfer rate of up to 12 megabits per second. The disk is removable, compact, 
lightweight, environmentally stable, and robust. Since the Write/Read/Erase (W/R/E) 
processes are carried out 100% photontcally. no heating of the recording media is required. 
Therefore, the storage media suffers no deleterious effects from repeated Write/Read/Erass 
cycling. 


ETOM media are novel erasable data storage media which utilize the phenomenon of electron 
trapping common in a class of luminescent materials known as IR stimulable phosphors. 
They are composed of an alkaline-earth sulfide host lattice and bvo rare earth dopants (the 
luminescent and trapping centers). Data storage is a fully photonic process which involves the 
interaction of light with the dopant ions and their electrons within the media. Also, due to 
their exceptionally wide dynamic range, these materials are capable of multilevel or non- 
binary recording. This coding technique can provide up to four times the data transfer rate 
using four discrete amplitude levels. 


The media uses two laser wavelengths to accomplisii the W/R/E processes. The transfer of data 
is based on a quantum effect which involve? exciting a luminescent ion and passing its excited 
electron to a nearby trapping ion. Once bo. md to the new ion. the electron falls to the ground 
state of the ion; this traps the electron. Th‘ s the stored state and is a stable condguratton for 
the electron. It will re«^ain trapped until a photon of the read light source excites it from the 
ground state to the exciu .d state. From here it can migrate back to a luminescent ion and fall to 
the ground state. The transfer back to the ground state is accompanied by the emission of *' 
photon which is detecteu by the disk drive and indicates stored data. 


Optex Corporation has developed this rewritable data storage technology for use as a basis for 
numerous data storage products. Industries that can benefit from the ETFOM data storage 
technologies include: telecommunications, entertainment, video imagery, and data/ image 
acquisition and storage. Products developed for these industries are wen suited for the 
demanding store-and-forward buffer s)rs...:ms and archival storage systems needed for these 
applications. For example, a digital video recording system based on 4x subcarrier sampling of 
standard NTSC composite color video (i.e.. the D-2 standard) requires approximately 1 gigabyte 
per minute of digitized video frames, and a transfer rate of 120 megabits per s< >.ond. A 130 mm 
..OM disk can store up to 14 minutes with les.s than 50 ms access time any frame. If a data 
compression techniques such as the curre nt 'APEG standard is employed, the same EHDM disk 
can store up to 18 hours of compressed digital video programming. 


285 




N93-80475 


The "State" of "The State of The Art" In Mass Storage Technology 

Dale TancMttr .OoawexCoipateTCoiyor ati o a 
3000 WiteiTlevFaxlnMgr.RiGhaidMB.lZ 78083 






"In the last couple years, there has been an abnonnal amount of interest and activity in the h 
automated storage appUcatlcm area. At Convex we have been heavily Invohred in smne of | 
these efforts. This paper will descttbescxne of our experiences and also discuss the trends that ' 

I axe occurring in this industry. 


The TOttalM and the Hare; Or Maas Storage and Vc./ Flat Coaqmtcts 


Looking back on the history of computing, tt is obvious that r:uass stora^ tedinology has 
grossly fadlen short of keeping up with processor tochnolcgy. it was observed that as maity as 
25 years ago the ratio between the amount of o!To>nalde on-line disk storage and the amount of 
physK^ memory on a :omputer wa.<i about 3000 to 1. In other words, a typical departmental 
computer would have maybe I6k bytes of processor physical memory ar ’ about SO Mbytes of 
"affordable’* disk storage- Now in 1992. that ratio has dn^ped to about lu to I or less. This ts 
best Illustrated graphically using the chart below. The amount of stor^e tn bytes fs 
represenlid on a log base 2 scale over time in years. These arc my own borne gfown numbers 
but ! believe represent a faliiy true picture. The really curious question is what hapi.ens when 
t!>e total amount physical memory on e computer is more than the amount of "affordable 
disk" that win be connected to it? It wiU be pos^le to buy ctmipulers with over 1 Terabyte of 
memory by the end of the decade for what is constdered a reasonable price. However the 
equivalent amount of disk storage wlU be quite expensive. Majrbe one <^tion is to store aU 
active data in memory and spool inactive data out to very high speed tape dlrectty and avoid 
the use of disks. I doubt this win happen, but tt may one da^ Irad to this archttecture. 

Certainly disk storage has achieved some impressive densities and speeds, but processor speeds 
and memory capacities have done much better. Because the computers are running much 
faster, th^ are producing much more data than in the past and at a rrvte that mass storage 
technology caimot typicaUy handle. Disk storage today costs about $2-5 ftcr MByte and that 
cost is droppirtg as higher density drives are produced. The pttvjlem is that disk stfxage wUl 
always remain much more expensive than people want because their mass storage izeeds are 
growing at a rate that Is faster than drive technology can handle. Without new technology, it is 
possible that the single largest cost for a data center will be for the storage and management of 
data. 

Technology now exists that allows a computer to have on-line access to viituvUy all the data 
that exists in the computer center. Until recently orUy the most prestigious and "richest" 
computer cente.s could aficid this type of technolc^. Now even a modest computer center can 
affo^ It and as a result everybody is gettirig on tiie bandwagon of "on-ltnlng” all their data and 
automating their data storage and management. 

In the past, the solutions to on-lining data were Jukeboxes and the ever popular STK Silo using 
4480 tapes. Optical jukeb ?xes have yet to really catch on and I think ntointy because the drives 
are still very slow and the total amount af "affordable" storage ts lYst very high. The 3460/4480 
tapes have done quite well In this area, but because of the expensive technology to automate it. 
It has not been applicable to the smaller date centers. The other problem has been that the 
Unix operating system has never been able to handle the virtual disk concept {the ability to 
store more data on a flletystem than what actually is available on the disk). 


287 


^ j utiaffiixg ^ 




The software problem with Unix has now been solved in many ways with many vendors. The 
most popular of these solutions is the UnlTYee Central File Manager (UnfTtee) from DISCOS. 
Convex luis also produced its own virtual disk product called the Convex Storage Manager 
(CSM). These two products are actual^ complem^taiy in that CSM har'*'es native ConvexOS 
filesystem full conditions and UnlTree handles network based clleni access to the large 
archives that It manages. We have plans at Convex to merge these two products. 


The IteaBties fdTtpt Technology 

The hardware problem has also been solved with the emeigeiKe of helical scan tape technology 
and high spe^ interfaces to these tapes. The two most recent additions are VHS and D2. 
Metrum Inc has taken VHS video tape technology and adapted it for digital storage and Ampex 
and E-in^ems have done likewise for the D2 tape technok^. With helical scan, data is stored 
by writing tracks of data at an angle across the tape rather than as longitudinal tracks of data 
sucli as you find on 9-track and 3480. By doing this, you can archive much higher tape density 
and throughput. A single T-120 cassette can store 14 G^tes of data with access rates of 2-3 
Mbytes/second and the 02 can store 25 GB on a small cassette and 165 GBytes on a large 
cassette with acccc*s rates of 15 M^*tes/sec. With these new tape technologies it is possible for 
most data centers to cost eflecUvety store all its data in a small 20 sq foot tape robot. As well, 
with UnfFree and CSM. all these data (between 6 and 8 TBytes) can appear to the user as being 
completely on-line. 


Our first experience with truty massive storage of data has been with the STK Silo. It is ea^ to 
say this technology has been extremely reliable and easy to use. B ’* the reality is that it has 


288 



rapidfy fallen behind in peifonnance and capacity. A single STK Silo can hold about 2.4 
TBytes of data on a good With newer 36-tiadc 4490 tape drives, this capacity could douUe 

to 4.8 Tera^es. The read/write rate still hovers around 3 Ml^es/sec. The footprint for this 
storage is in the 150 sq ft range at a cost of about $600,000. With newer technoEiogy. it is 
possible to buy an 8 Terabyte system for the same price and a footi^int of about 20 sq ft. 
However. STK is the incumbent and has plans to adopt the helical scan technology and 
integrate into the existing Silos. The table below gives you a comparison of the various tape 
teduKdogies available today (all of vdiich connect to a Ctmvcx computer as well). 



One of the tape technologies clearly not shown is 8mm. It has been our experience that this 
tape teclmology is very unreliable in terms of re-reading tapes that havC been written. We have 
also heard of the tape cartridges themselves breaklrig after only a tew hundred robotic mounts 
and unmounts. Because of the relatively high rate of non-readability of the tapes. I believe 
anyone would be taking undo chances for using this technology for daily use of mass storage 
ar^ data archiving. It may be this situation will change and then 8mm will become a very 
competitive product for this application. 

Helical scan is an interesting technology that was specifically designed for the video industry. 
However, when taking this technology to ^ore data, problons can occur. First of aU. when 
recording video, the user of this technology does not care if the recording is absolutely without 
error. Since the human eye cannot perceive a single or double bit error in a million bits, it's 
not a problem. However, if this is digital data, it could easity represent the data for your bank 
account. Now the level of concern goes up a few notches. The Bit Error Rate (BBR) for helical 
scan without extensive error detection and correction is easily below 1 bit in a few giga* * 
data. A couple of D2 and D1 vendors have produced drives of this quality. However. mc£ ssrs 
find this BER unacceptable. The drives pr^uced by Metrum and Ampex have BE^ of 1 bit in 
about 1 Terabyte of data. To achieve this level of BER. two additions were made to the 
recorders. The first is a read-after-wrlte of the data that is being recorded. The recorder will 
continuously re-tiy the write of a block data until it has read t ack tlie entire block error free. 
For reading of data. 3 le\ els of error detection and correction {..-e used to recover from tape 
errors due to bad heads or deteriorated tape. These drives are brar.d new and have Just recently 
been put into production. Much factory testing has been done successfully and with extensive 
field testing. I believe these drives will prove to be mainstay products in mass storage. 


28S 











C'*ie of t 'le interesUiig features of helical scan tape, especially in the D2 lecorder from Ampex 
aird \’H£ recorder from Metrum. is the sound track Is used to store block and file positioning 
ini9nnation. By doing this, the tape can be "searched" at almost rewind and £^-forward 
^eds. This alleviates the need to read the data on the tape to find EOFs between files to do tape 
positioning. I believe this feature will help these tape drives to be even more popular for the 
virtual di^ application. However on the down side, these drives are design^ to use large 
blocks of data to achieve the high data rates. So in a fileserving type of application where there 
are rmry tanall files stored on the tape, the performance will drop drar^ticalW. Also, if this 
fast search capability is rxit used by the software or not support^ by the dri\^. then the file 
retrie /al and tape search operations bring the performance to a crawl. These two features have 
to be worked around by the software that controls these drives. For small file storage, the 
vtrtui I disk software could consider doing clustering of these small files so that mar^ files are 
wrlttc 1 at once in very large blocks (and retrieved m like marmer). For tape search, it is 
imper :the that both the software and the hardware support the use of the sound track for 
block and fi>e position*- .j on tape. 

The last h>sue with the helical scan drives are the head wear. The average lifetime of the heads 
(there are typically 4 in a sin^e drive), is prelected to be about 500 hours of actual read/write 
time. At first, this seems pretty low. but the reality is that the tape heads are never constantly 
in use. much of the time is used to do tape search (if using the sound track) and rewind. 
Assumir»g there are at least two of these drives used in a virtual disk application, it would be 
sate to assume a head-use duty cycle of about 20-30% of wall clock time. This means that the 
heads have to be replaced at ji>t every 3 or 4 months. The heads are quite expensive, especially 
on the D2 tap:. As these urlves and heads go into mass production, the costs and durability of 
the heads should get muc i better. 

A closing commt on the use of these new high speed drives. This new technology demands 
that you use a computer capable of keeping the drive bu^. otherwise, why buy a drive that can 
run 15 MB/SoC and your workstation can only push it to 5-10 MB/sec. Also, you will typical^ 
have more ..ran one of these drives and this would further saturate any typical wor^tation 
today. Convex our architecture is designed to support many such drives and also allow for 
slmul' neou.} network activity. As people begin to benchmark such performance, it will 
beer . cleat why the inv^^onent in the c<»nputer is as important (or more so) as the mass 
st-'iage robotics and dn.es. 


290 






The ReaHtks of FIleaeiver and llfew Stofafe Software 


As mentioned earlier, the clear trend for most cmnputer centers Is to on-line all or most of 
their data. The recent availability of software to do under Unix has made it both possible and 
extremely desirable. Another effoit to make this more available is the effort of the IEEE Mass 
Storage Working Committee. This Commil.ee is tasked with coming up with a model or 
standard for the design and implementation of a software based mass storage system. At this 
point, the modd is very high level and thus, almost any data management software system is 
compliant. As these folks make the modd more detailed it will cause proprietary systems and 
products to be revised or replaced. 

The first application that most people are interested in indudes that of the virtual disk 
concept. A typlod computer center active^ uses about 10-20% of the data on the disk. The 
other 90% would be considered old. Mai^ vendors, induding Convex have implemented 
software to handle this capability. We have modified ConvexOS. a Unix based OS. to be able to 
tecognlee and generate JOe-foadisr. the ability to access a block of data that dees not currently 
exist on disk and should be paged into the disk from tape by the operating tystem. We detect 
this fault at the read or write le^. not the open. By detecting the fault at the nrad/write level, 
this “^-fault" feature can be used by our native Nre implementation. In this maimer, client 
computers using Convex filetystems via NFS can store ^ta on the NFS mounted fUctystem 
and have it migrated to tape automatically. If the file fault was detected only on the open of a 
fite. NFS could not be used, since NFS opens are not propa^ted bsKrk to the NFS server. In a 
nutsheU. we have treated the disk space much like we treat physical memory. We can page in 
individual blocks of a Ole on demand, with read ahead. Just like we page in pages of virtual 
memory to physical memory. It is obvious though that this demand paged virtual disk must be 
tuned very carefulty. otherwise you could easlty have a thrashing and resource allocation 
problem due to the demand placed on it by the fast processor. In general, the paging feature 
should only be used on veiy iai^ files: otherwise, it is best to have the entire Ole read in when a 
fault occurs. 


By having the Ole-fault feature in the kernel, the migration of the Ole from disk to tape and 
back is totalty transparent to the applications running on the tystem. All read, write, open 
and close system calls can be used without change. The migration of Oles Is done until a disk- 
fuU condition or periodical^ as needed to keep the disk at a low-water mark for free ^>ace. One 
of the main problems that will occur when using the native Unix Oletystem. is that you could 
eventualty rui^ out of inodes (file handles) for your filetystem. If a virtual filesystem is 
terabytes in size, you could assume this could easily represent hundreds of millions fries. I 
would guess this is one of the major problems that will have to be solved In the future for all 
native Unix filesystems. 

Another aopllccUon that k: very popular is a centralized network archive/frleserver system. 
At Convex^ .ve have embraced Unfltee as the primary software package to handle this. UnlTtee 
creates and c. jK>rts its own frle^istem to client computers on a network. T>e client computers 
access the fries in UnfTree via FTP and/or NFS. The client computers simply store data to this 
frletystem using familiar interfaces and the host or server computer running UnfTree ensures 
that the data is migrated to tape as needed. With UnfTree and the new tape technology, it is 
possible to have a network frleserver/archiver that can store on order of several terabytes that 
can be transparently accessed by client computers and it does not have the limited number of 
inodes problem that occurs with normal Unix frletystems. 

One of the major weaknesses with network based archiving and fileservlng using this new 
technology is that NFS (Network File System) has not a clue about file migration and long 
access times due to tape mounts and such. In the end. something will have to be done to solve 
these problems. The timeout problem with NFS is fixed by slmpty tuning all your NFS clients 
to have a longer timeout period for those filesystems that are known to be under file migration 
control. The other problem of identifying files that are migrated cannot be easily fixed with 
NFS. When doing a long listing of the files on thp.t filesystem, there is no way to tell which files 


291 



are migrated and adiich ones are not. Also, there Is no easy signalling mechanism to Inform 
the user that the file being accessed Is currently being staged Into disk. This Is something that 
will eventually have to be taken care of fay some means. 

At Convex, we have Installed several UnlTrce systems around the world. We found quickly that 
UnTTree out of the box from DISCOS (the providers of UnfTTee) was far from being a production 
level product Working with Titan Corporation, we have produced the world's first production 
quality UnlTrec system. We found many problems related to data Integrity, disk full 
onnriitions and in general where the UnlTrce system is stressed by hundreds of requests. 
DISCOS continues to Improve the quality of the software and believe that as other vendors 
bring the product to market that it will become the dominant file management product. 

One of the critical things we have learned is that when brining up any mass storage solution. It 
should be done slowly. Trying to move a Terabyte of data Into a on-line state In <me or two days 
is rx>t ideal since the system may Irxleed work fine, but will be swamped with all this new data 
that In reahty only 10% of it will ever really be used. It will take several days or maybe even 
weeks for a mass storage system to become stable. Stable, in this context, would mean that 
most of the data that will be used on a regular basis wiD be hi the disk cache and the data not 
used often or at all will be In the tape system. 

Related to network flleservlng is that of the Distributed Conqiutlng Environment (DC^ from 
OSF. The Distributed File Si^on (DFS) portion rtf DCE (known as the Andrew filesysteni) will 
allow for the creation of a single monolithic filesystem over an unlimited number of 
computers. Mar^ of the companies I have talked to are very interested in using this 
technology. Currently DFS Is not in production and only used experimentally malnty at 
Universities. 

One of the weaknesses of DFS alreac^ Is that it does not have the ability to migrate files to and 
finom tape. So when a DFS filesystem gets full, you have the same problems as before with the 
normal Unix filesystem. However, there are efforts under way to Int^rate DFS with UnflYee 
to allow for a virtual dbtk that serves a whole network of computers. I believe this will cneate a 
"perfect* world for most people as lotig as Andrew and UnfTtee live up to their billings. 

Another technology that Is based on Com^ hardware and software Is the EMASS storage 
system finm E-Systems. It is an int^rated data management solution consisting of D2 tape 
drives and robots and the Fileserv software. One of the basic concepts of EMASS Is that of 
scalable/growable data storage. It Is a fact that as people on-line their data, they will continue 
to need more and more data storage. With the EMA^ Datalibrary. you can grow from about 27 
Tcral^es for the fir^ module, up to 10 PETABYTES. There are many sites in the world that 
could use this capability and capacity tods^. The average amount of storage used Ity centers 
today Is about 1 Teraltyte. By the year 2000. this number wiQ grow to about 100 Teraltytes or 
more. Given this to be the case, EMASS is ptositioned well to handle these requirements. 

The Fileserv software Is very extensive in its ability to track and store data. It has a very rich 
accounting system and the interface to the ^tem is via the normal ConvexOS fUe^stem using 
our file-fault interface. One of the most interesting features about Fileserv is that it supports 
what I call tape sn0ng. This is the ability to automatically track the BER of all the tapes in 
the qrstem so that as tapes begin to degrade in readability, the data is copied to another tape 
and the bad tape Reeled from the system. Even those tapes that are never read through normal 
demands on the tystem are tracked by simply reading those tapes. This activity Is tunable and 
Is a background process that does not add a significant load to the system. The nice thing about 
this feature Is that it puts a stop to the question of how long a storage media lasts. In this case. 
It really doesn't matter as long as it's reasonable (say 5-10 years). 


292 



Bumamxy 

In summaiy. I can say that fundamental^ eveiy data center In the wc rid Is or win shoitty be 
very Interested In solving their data management problem throu^ far more efficient and 
^ectlve means than what th^ have today. I also believe that hdlcal scan tape technology win 
be the mainstay storage technology to accomplish this, coupled with an IEEE Mass Storage 
Reference Model based software system. I think that it wlU be commoiqilace to have 
DCE/Andrew on most computers with at least one Andrew server running UnfTree as the 
virtual disk manager. 

I also believe that as rjore and more computer centers on-line most of their data, that th^ win 
want the abtltty to understand what data they have to better utilize It. This Is generally known 
as the meta-dota pnMem. The Intent is that If there are several million files, how do you 
know if you processed/iead aU the data on a given Item or t<vlc? By Integrattng expert systems 
and oigoc^-oriented databases with the file management software, users can have an extremely 
produ^lve tool that in some cases would give companies a competitive edge for their particular 
applications. There are some people in the world Just starting to really work this issue and I 
believe wlth*n a couple years there wiU be substantial prototype systems available to help 
manage this problem. An Interesting side effect of the meta-data p^lem Is that companies 
win need to generate more and more meta-data by ’Yna.ssaglqg" their data. This will require 
both more processing power and more storage, so It Is imperative thrt companies Invest 
property in ^elr computers so th^ can expand Ixith the processing capability a^ their mass 
storage options easily. 

There is also a large portion of the computer population that is not well informed on tL<; state 
of mass storage technology. I believe as people learn about this leading edge technology, 
demands for totalty integrated mass storage solutions will Increase dramatlealty. 


Ampex is a trademark of Ampex Corporation 
Unix is a trademark of AT&T 

E-systems. DataTower and Fileserv are trademarks of E-Systems Incorporated 
DISCOS and UnflYee are trademarks of DISCOS Inc. 

Convex. CSM. Convex Storage Manager are trademarks of Convex C(»nputer Corporation 
STK. 4480 and STK Silo are trademarks of STK Inc. 

Metrum is a trademark of Metrum Inc. 


293 




N93-80476 

Measurements Over Distributed High 
Performance Computing And Storage Systems 


Elizabeth WUlUins 

Supercomputlng Research Center 
17100 Science Drive 
Bowie. Maryland 20715-4300 



- 52 —^ 




Tom Myers 

Department ^ Defense 
9800 Savage Road 
Ft. Meade. Maryland 20755-6000 



1.0 IntrodiKtion 

Requirements arc carefully described in descriptions of systems to be acquired but often there is no re- 
quirement to provide measurements and performance monitoring to ensure that requirements are met over 
^ long term after arxeptance. A set of measurements for various Unix-based systems will be available at 
the 1992 Goddard Conference on Mass Storage Systems and Ibchnologies. The authors invite others to 
contribute to the set of measurements. This abstract gives the framework for presenting the measurements 
of supercomputers, workstations, file servers, mass storage systems, and the netwoilts that interconnect 
them. Production control and database systems are also included. Though other applications and third party 
software systems are not addressed, it is important to measure them as well. 

The capability to integrate measurements from all these components from differer* vendors, and from the 
third party software systems has been recognized and there are efforts to standardize a framework to do 
this. The measurement activity falls into the domain of management standards. Standards work is ongoing 
for Open Systems Interconnection (OSl) systems management; AT&T, Digital and Hewlett-Packard are 
developing management systems based on diis architecture even though it is not finished. Another effort is 
in the UNIX International Performance Management Working Group [1]. In addition, there are the Open 
Systems Foundation’s Distributed Management Environment and the Object Management Group. A paper 
comparing the OSI systems management model and the Object Management Grou^ lodel has been writ- 
ten [2]. 

The IBM world has had a capability for iiKasurcmcnt for various IBM systems since die 1970’s and differ- 
ent vendors have been able to develop tools for analyzing and viewing these measurements. Since IBM 
was the only vender, the user groups were able to lobby IBM for the kinds of measurements needed. In the 
UNIX world of multiple vendors, a common set of measurements will not be as easy to get. It is hoped tliat 
this paper will strengthen the effort to describe a minimum set of measurements. 


2.0 Uses for Measurements 

Seven types of uses have been identified. These arc: 

( 1 ) distributed computing system scheduling 

(2) fire-fighting - solve immediate problems to provide acceptable response time and resource alloca- 
tion to all processes 

(3) tuning systems for current workloads 

(4) capacity planning 


295 





(5) allocating resources 

(6) lo(^ng for trends and characterizing workloads 

(7) verifying system strategies arc working or assunq)tions about wtxkloads are valid, e.g. locality of 
reference 

The following two points are very iirqxxtant (1) For fire-fighting and tuning, a systems administrator must 
be able to link a particular “event" to a set of user commands.The systems administrator should be able to 
know when a resource is reloading slowly and which process is causing the problem. We stress that it is 
important to be able to link particular events of interest back to user commands diough we know that it is 
sometimes difficult. (2) Process as well as system-wide measurements ate needed. 

It is also understood that taking measurements and collecting them are oveiiiead and may in extreme cases 
affect the perfrraiance of the systems measured; this is not ^>ecifically addressed in this pt^r. However, 
data can be collected at various levels of detail depending on how much overhead is involved. The most 
complete level of measurement is a log or trace of each transaction or event The next level of measure- 
ment is a set of counters that produce a histogram, which is an j^proximation to the distribution, of the 
metric of interest The least detaited level of measurement is a single counter from which the average and 
variance of the metric of interest can be derived. The level of measurement for any compement depends oa 
the overhead associ^d with the woridoad. 


3.0 Model of Distributed High Performance Computing Systems 

In Figure 1 we present a model of the components of a distributed high performance ctxiqtuting system. 
This model includes input sources to indicate the coUectirm of data for processing in the system. The dis- 
tributed characteristics of this model are not depicted specifically but one can think of NASA’s EOS system 
as the basis for this model 

The components in the model are supercomputers, wcH-kstaticxis, mass storage system, file servers, net- 
works, input imvjhines, database systems and productiem control systems. The rtKxlel represents both hard- 
ware and distributed software aspects of the cotiqxments. Each circle represents a hardware conqxment 
Each square represents a software component that may be iiiq)lemented on stxne subset of the hardware 
conq)onents. The network is represented by arrows indicating Lnterconnectitnis. The dots indicate a set of 
distributed components. 

Below the system conqxment level are lower level resources that are also necessary to measure. These are 
the hardware resources such as CPU, memory, disks, channel, external I/O, paging and caches, and the 
software resoicces such as buffers and queues. 

Measurements at both system conqx)nent and hardware/software resources levels are desired. 


4.0 References 

(1) Leon TYaister and Terry Flynn, “A Measurement Architecture for Unix-Based Systems”, CMC 
Transactions, Winter, 1991, pp. 69-77. 

(2) Peggy Quinn and Geotge Prtoteasa, “Reconciling Object Models for Systems and Networic Man- 
agement", Technical Report, UNIX System Laboratories, Inc. 


296 




29 




N93-80477 


Analysis of Cache tor Streaming Tape Drive 


V. Chinnaswamy 3 — 3 ^ 

8 Quail Hollow Road ^ 

Westboro. MA 01581 ■ // ^ 


1 Introduction 



A tape subsystem consists of a controller and a tape drive. Tapes are used for backup, 
data interchange and software distribution. This |/aper is concerned only with th >2 be ;kup 
operation. During a backup operation, data is read from disk, processed in CPU and then 
sent to tape. The processing speeds of a disk subsystem, CPU and a tape subsystem are 
likely to be different A powerful CPU can read data from a fast disk, process it and supply 
the data to the tape subsystem at a faster rate than the tape subsystem can handle. On the 
other hand, a slow disk drive and a slow CPU may not be able to supply data tast enough 
to keep a tape drive busy all the time. The backup process may supply data to tape drive 
in bursts. Each burst may be followed by an idle period. Depending on the nature of the 
file distribution in the disk, the input stream to the tape subsystem may vary significantly 
during backup, lb C0w.pensate for these differences and optimize the utilization of a tape 
subsystem, a cache or buffer is introduced in the tape controller. 


Most of the tape drives today are streaming tape drives. A streaming tape drive goes into 
reposition when there is no data from the controller. Once the drive goes into reposition, 
the cont: oiler can receive data, but it cannot supply data to the tape drive until the drive 
completes its reposition. This reposition time may vaiy fr~ n several milliseconds to a few 
seconds depending on the technology of the drive. A coc oiler can also receive data fr om 
the host and send data to the tape drive at the same ti\^e. 


This paper investigates the relationship of cache size, host transfer rate, drive transfer 
rate, reposition and ramp up times for optin:ai performance of the tcipe subsystem. Formulas 
developed here will also show the advantages of cache watermarks to increase the streaming 
time of the tape drive, maximum loss due to insufficient cacht trade offs between cache and 
reposition times and the effectiveness of cache on a streaming tape drive due to idle times 
or interruptions due in host transfers. 


In Section 2, several mathematical It Liulas are developed to predict thb performance of 
the tape drive. Some examples are ^;ven in Section 3 illustrating the usefulness of these 
formulas. Finally, a summnry and some conclusions are provided in Section 4. 


2 Mathematical Anaiyeis 

The pert'ormance of a tape subsystem depends on several variables and their relationships. 
In this section, several formulas are developed for the throughput of the tape drive. 

Let 


* A denote the host transfer rate, 

* |i, the drive transfer rate, 

* C, the cache size. 


299 







* (r, the reposition time. 

* t4, the ramp up tiipc delay to request, 

* <•, the streaming time before next reposition. 

Any other variables of interest will be defined as needed. For now, refer to P = tr-Mii-r as 
a oeriod. All the throughput numbers will be in kilobytes per second. All times will be in 
seconds. 


2.1 Host trmsfer rate < the driva transfer rate 

case i: A < m A(tr + <4) < C, no idle time in host transfer, 

i.e., cache does not get filled up during reposition and ramp up time. 


cache ; 
size ->' 


Ai:^ounc 

of 

data 


/l\ 
/ I 
/ I 
/ ; 

/ ! 

/ I 

/ j 

/ ! 



\ 

\ 


1<-->|<->1< >1 time 

tr u r, 

(• denotes *he drive streaming time. Each period repeats itaolf until the whole backup 
operation is over. So throughput can be calculated firom just one period. 

case ii: A < Mt A(tr + 1 ^) > C, no idle time in host transfer, 

i.e., cache gets filled r.p during reposition and ramp up time. 


cache i 

size 

‘ / 
Amount i / 

of I / 

drta ! / 

I / 

I / 

. / 

I / 

1 / 

!/ 



\ 

\ 


l< ->|< >l< 

tr t4 


>1 




300 



When cadie gets hill and the drive is in reposition, host transfer gets blocked. When this 
happens bandwidth is lost. There is no idle time in host transfer except during this blocking 
time. The above two cases will be analyzed first before getting into several other cases. 

Analyzing the figures for one period, we get the following relationships: 


M*r + t< + *•)*!«*« + 


c + ifX{tr + U)>C 

From these two equations, we can get the value of 

+ > C 

Since the process repeats Itself for each period, the effective throu^put, T, of the tape 
subsystem can be calculated firom one period. 

— _ Total Dmtm Traiufarrml Drw ^ n 

~ Total Timt ~ (*r + *<+*•) 

We may often refer to T as an approziinate throughput since we are neglecting the initial 
time due to label cheddng, trade turn around time, etc. However, these times would become 
n^^igible adien we are conridering several hours of backup time. 

Using the conditions above, we get 



•/Mv+*d 

if X{tr + t4) > C 


2.1.1 Marimum ices in affective throughput 

Wher A < ji and A(tr -t- 1_) > C, the host transfer is blocked when the drive is in reposition. 
In this case, there is a loss in throughput due to insufficient cache. 

The loss in throughput due to insufficient cache is given by 


L = A- 



Differentiating with respect to A, we can prove that the maximum loss occurs when 


*=(7+1*)- 



where t = 4 +* 4 - For C = 512, t » 1.36, and n = 800, the maximum loss occurs when A 3 628. 
When A » 628 KB/sec, we get only a throughput of 550 KB/sec, a loss of 78 KB/sec. 


301 


2.1.2 Cache Watermarks 

When A(<r -t- < C, we 'ok of introducing a watermark level at C- At^ such that 

we fill up the cache before th <pe drive starts transferring data. 

When the controller tells the drive to start writing data, the drive does not start writing 
data immediately. There is a ramp up delay in its response time. This time is not negligible 
for some drives. Suppose the ramp time is .5 seconds. If the drive is told to transfer data 
when the cache is 100 percent full, the host transfer will be blocked for 500 milliseconda 


tdche I 

size ->• * 

f / » \ 

! / ' \ 
f / J \ 

CW.*! '->* * i \ 

\ / ’ i \ 

AiTxiunt ! / I 1 \ 

of I /II \ ... 

tlaca i / I I \ 

I / I ! \ 

i / I I \ 

I / I I \ 

1/ I ! \ 

*■ 4 - ♦ 4 


!<-->!<->;<•>;< >1 

*r h U *• 

In this cast, we have 

A(4 + <y + *4 + - id, if X{tr + tg} < C 

where t, is tiie additional wait ti ne to bring the data in cache to the watermark leveL 
There is no point in setting a watermark if A(tr + ta} > C. 

Solving for t«, we get 

. -I- «.->•*<) 

Im-a) 


Throughput = = - TifeiT 

Using the vai..d of t„ we get 

T = 




= A 


Introducing a cache watermark has not changed the throughput. But the streaming time has 
increased (and co-osequuntly the number of repositions during a given time has decreased) 
since 


A(tr-»-t«4-ta) A(tr-M«>) 

(#i-A) (m-A) 


Given the total time or total amount of data, we can easily calculate ) number of reposi- 
tions saved by using the cache watermarL If increased number of repositions causes any 
reliability concerns, it is worth considering introducing cache watermarks when A(tr-(-< 4 } < 
When A(tr + 1^) > C, there is no point in introducing a cache watermark. It does not c^oge 
the throughput. 


302 



2.1.3 Host transmission has idle psriods 

Let 

• be tiie continuous host transfer time 

* tj be the idle period before the next transmission. 

We will assume that these times are contants and do not v>iry from period to period, 
case iii: A < m A(tp + 1^) < C,t,- > ^ + tr 


cache : 

SxZe ->* 

I 

amount ! 
of I 

data I 
I 

! /i\ 

I / I 

1 / . 


/I\ 
/ i 
/ 1 
/ I 



\ / 


\ 


K->|< > i<-> I <->!<- - 



‘ ->l 


/\ 

/ \ 

/ \ 

/ \ 

/ \ /\ 
/ \ / 

/ \ / 

. ♦ ♦ ♦ 

|<->|<->!< >!<->(<--> I 


U- ^ tr S 


I < . . . > l< 

Ti U 


The approximate effective throu^d^put is given by 


>1 


\ 


Tw 


TlA 

3l+t< 


The results are also true for the case 0 < li < ^ + *r 
case iv: A <^,A(tr + tJ > C.tf > ^ + *r 


i 

I 

1 


cache 

j 



size - 

>• 

- - f 




/ 1 \ 


amour tc 

1 

/ 1 

\ 

of 

1 / 

\ / i 

\ 

data 

1 / 

\ / 1 

\ 


! / 

\ / 1 

\ 



▼ — 



t <-> 1 


> 


td 

t. tr t4 



1 < “ 




/ \ 

\ 

/ \ /\ 

/ \ / \ 

/ \ / \ 


|<->|<->|< >l<->|<-->| 

tr t4 t, tr * 

>l< >1 


303 



lu this case, the host transfer is blocked when the each ; gets full. The approximate effective 
throughput is given by 


^ (ri-i»(4-Ms-f)|A 

Ti+li 


where n is given by 


•*=1 




J 


If pi -i^t, +t4 + ^j)) > (1+ 5^)*4 + *r 
then n = n -f 1. 


The results are also true for the case 0 


2J2 Hoat Trsnster Rato > Drive Tianstor Rato 

In this seeJim. we will analyze all cases arising from the condition when host transfer rate 
exceeds the transfer rate of the drive. 


2J2.1 No idle Tbne bi Ho^ Transter 

case y: X> It, A(<r + t^<C, no idle time in host transfer. 

cache I 

size “>* 

I / 

Amounc I / 

of I / 

data I / 

I /! 

I / I 

I / I 
I / I 
I / I 

I / I 

I / 1 

1/ I 


I <-->!<“>!< > 

tr U t . 

In this case, input is blocked as soon as cache is fuU. Input rate will be limited to the output 
rate. E6e-':tive throughput of tlxe tape drive is the maximum througd^put capacity of the tape 
drive. Cache has no significant impact 

case vi: X> ft, A(tr + > C. no idle time in host transfer 


304 



cache I 
size ->♦ 
I 

AjTK)unc ! 
of I 

data I 


I / 
I / 
1 / 


l< >!<->!< > 

<r U 

In this case, input is blocked as soon as cache is full. There is no input or output transmission 
for some period. This is a lost bandwidth. After this no transmission period, input rate will 
be limited to the output rate. Cache again has no impact. 


2.2.2 Host transmission has idle periods 

case vii: A>|i,A(tr + t^<C.t, <^ or A>|i,A(tr + t<) >C 7 ,*i < ^ 


cache I 

size ->• 

I / l\ / \ 

amount I / I \ / \ 

of I / I \ / \ 

data 1 / I \ / \ 

I / I \ / \ ... 

I / I \ / \ 

I / I 

1/ I 


|<->!< . .. > 

*4 *• 

I < , , . > |< >1 

Ti «i 


In this case, the input transmission begins before the drive empties the cache. The drive 
streams all the time. The effective throughput is approximately the same as the drive 
transfer rate. 

case viii: A > M.A(tr + t4) <C,C<*i<£ + t,orA> MiA(tr-l-t4) 

i.e., the input transmission begins after the drive empties the cache, but before the drive 
completes its reposition. 


305 



cache 
size -> 

amcjnt 

of 

data 


I 

/ !\ 

/ 

\ 

1 / 

1 \ 

/ 

\ 

1 / 

1 \ 

/ 

\ 

1 / 

1 \ 

/ 

\ 

1 / 

1 \ 

/ 

\ 

1 / 

1 \ 

/I 

\ 

I / 

1 \ 

/ 1 

\ .. 

1 / 

1 \ 

/ 1 

\ 

1 / 

1 \ 

/ 1 

\ 

1/ 

1 \ 

/ 1 

\ 


*r t4 

l< 


t. 


>|<-->K->|<- 

tf u 

->l < >1 

k 


->i 


(Ti+tj — 2 — 4 - t^)ft + C 

(Ti+t.) 

_ (Tl -M< - tr - 

m+g 

' Ti+U* 

The approximate effective throughput is less than the drive transfer rate. How much less 
will depend on the reposition time and idle time. 

case ix: A > ^A(tr + <ri) < > J + V or A > M.A(tr + > f + t. 

The idle period is longer. The drive empties cache, completes reposition and then waits for 
data. 


cache I 
size *>• 
I 

amount 1 
of I 

data i 


l\ 

I \ 

! \ 

1 \ 


1/ 


td 

l< 


Ti 


I 


>1 

t, tr X U 

>i< >1 

*< 


The effective throughput for one period is given by 

(ri-t^)M + C 
Ti+ti 


(>fi + - Un -i-C-iitj 

Ti+ti 


306 



= (1 rTT^^'* 

Ti + t, 

The approximate efTective throughput is less than the drive transfer rate. How much less 
will depend on the reposition time, wait time and idle time. 


3 Some Hiustrative Examples 

Suppose we have a drive with m = 800, t, = 1.0 second, = 0.350 second, c s 512 KB. For 
continuous host transfer and for all A < m. the graph in Figure 1 gives the throughput for 
difTorent cache sizes. We lose some throughput with 512 KR cadie. 1024 KB cache gives 
better performance than 512 KB cache. More than 1 MB cache seems to be a waste. 


Figure 1 : Cache Sizes and Throughput 

E-F-Tect A veness oF Cache 



Let us consider the effect of increasing only the tape drive speed, i.e., m e: 1600, tr = 1.0 
second, u = 0.360 second. Figure 2 shows the performance for various cache sizes. For all 
X < ft, increasing the drive transfer rate will decrease the performance of the system unless 
there is an increase in cache size. A cache size of 2 MB is needed when the drive transfer 
rate is increased to 1600 KB/sec. 


307 




Figure 2: Performance for Different Cache Sizes 


E-Ff ect i v«ness o-f Cacha 
Drive Transfer Rate 1900 KB/s. 



A comparison of Figure 1 and Figure 2 shows that increasing the transfer rate of the tape 
drive without a comparable increase in cache size and/or decrease in reposition time has a 
negative impact in the perfomumce for certain range of input values. The throughput can 
be increased by reducing the reposition and ramp up time instead of increasing the cache 
size. 


308 



Figure 3: Performance for different reposition times 

E-F-Fect iveness c-F r'eposition tims •For' 
1600 KB/S Drive Xfer Rate, Cache 512K6 



Inpu*t Rate <KB> 


4 Summary of Resuita and Conclusions 

case i: A < < C, no idle time in host transfer 

case ii: A < /*, A(tr + > C, no idle time in host transfer = 
caseiii: A</i,A(tr + (tf)<C’,ii(>^ + tr ==> 

case iv: A < m, A(*r + *a) ><?,*, > J + *r => 
case v: X> n, X{tr + 1^) < C, no idle time in host transfer 
case vi: A > Mi A(<r + > C, no idle time in host transfer = 
case vii: A > Mi A( tr -I- *s) < ('tU < ^ 

case viii: A > f>,A(tr + (s) + =♦ T«(l - Ij^) 

caseix; A>M,A(tr + * 4 ) <C,t< > J + 4 

case x: A > p, A(<r + * 4 ) > <7,1* < C => Tm/t 

casexi: A >M,A(*r + *4) > <7, J <!< < J +4 =» r*w(l - 

case xii: A > /i. A(tr + < 4 ) > C,t( > ^ + rs((l - 


TmX 


Tm/i 
> rM|t 


M 


309 




When the host transfer rate is less than the drive transfer rate and if cache doesn't get 
filled up during reposition, the throughput rate would be the same as the host transfer rate. 
When the host transfer rate exceeds the drive transfer rate and either the host transfer has 
no idle time or the idle time is less than the time to empty cache, the throughput would be 
the same as the drive transfer rate. In all other cases, we lose throughput The amount of 
loss would depend on the parameter values and their relationships. 

In case ii, we lose throughput either because we have insufficient cache or the reposition 
time is high* 

In cases iii, viii, ix, xi, and xii, we lose throughput because of idle time from host transfer. 
When there is an idle period, the tape drive 

• will stream if < £. 

• will not stream if tf > ^. 

In case iy, we lose throughput due to both idle time and insufficient cache. 

These formulas are helpful to understand the behavior of the new tape subsystems when 
there are changes to any of the parameter values. They also predict the backup throughputs 
for any specified parameter v^^lues. 


310 



N93-S0478 


82 ^ 


LANL Hl^-Peifonntnce Data System (HPD8) 




If. wUUam ColUns. Dsimy Co<dc. Lyna Joaw, Lynn ZlaegeL and Che»]d Ramsey . / 

Lot AbunM Natlmial LabotatMy /] Is 

Computer Systems Group MS B294 fr ^ 

Los Alamos. New Mexico 8754S I' 


Abstract 

The Los Alamos Hlg[h-Perfonnance Data System (KPDS) Is being developed to meet the vci^^ 
large data storage and data handling requirements of a hig^ -performance computing 
environment. The HPDS will consist of fast, large-capacity storage devices that are directly 
cormected to a hlgh-spe\,d network and managed by software distributed In workstations. This 
paper win present the HPDS model, the HPDS implementation approach, and experiences with 
a prototype disk array storage ^tem. 


Introduction 

Advances in massively parallel, large-memory computers and high-speed cooperative 
processing networks have created a high-performance computing environment that allows 
researchers to execute large-scale codes that generate massive amounts of data. A large 
problem wiU generate frcm tens of gigabytes up to several terabytes of data. These 
requirements are one to two orders of magnitude gieater than what the best supercomputing 
data storage systems are now able to hsuidle and will require a new generation d* data storage 
^tems. As Uie massively parallel machines hjcome more powerful, the data handling and 
data storage requirements will likewise iiicrear.e. requir'ng even more powerful data storage 
^tems. 

To meet the data storage and especially the data handling requirements of this hlgh- 
pe ibrmance computing environment, a lata storage ^stem model, in which storage devices 
are directly connected to a high-performance network and data is transferred directly between 
the storage devices and the client machines, is needed. 


HPDS Model 

The High-Performance Data System (HPDS) model Is based on storage devices that arc 
connected directly to a high-performance network, such as a HIPPI-based (High-Performance 
. n allel Interface) network, so that data can be transferred directly between the storage devices 
uiid client machines, instead of the traditional method requiring an intermediary mainframe 
computer. 

The HPDS model is shown in Figure 1. Disk devices a’’e used to meet high-speed and fast- 
access requirements, and tape devices are used to meet high-speed and high-capacity 
requirements. By connecting the disk and tape devices directly to a high-speed network, 
higher data transfer rates and reduced hardware costs are realized. The model uses separation 
of control and data to provide increased flexibility and performance. 


311 



r 


Control 








Client 


- 



File 

Server 


Disk rtorage 
Systems 


1 


Disk 

Server 


Disk 

Device 


Tape Storage 
Systems 


■ 

■555; 



1 

Tape 

Server 

Tape 

Device 


Data 

Figure 1. High-Performance Data System Model. 


The recent availability of HlPPl-attached disk arrays allows implementation of a disk ^rray 
storage system based on the HPDS model. Other computing in.stallations have c^^nnected 
HIPPI-attached disk arrays to clients and have operated them in a mas«.er-slave mode with the 
client sending read/write commands to the disk array using a HlPPi comm>AZid-data port. This 
mode necessitates implementing a device drl’-er for each client and ' reates integrity and 
security problems because each client ca i rear* . write anywhere in the ulsk a»tay. 

A more secure approach, and one that allows a peer-to-peer data transfer between the disk 
array and the client machine, is to associate a workstation with the disk array to implement a 
disk array storage system. In the HPDS model, this workstation is referred to as the disk 
server. All requests to store and retrieve data are made to the disk server, which then Issues 
the read/write commands to the disk array through an Ethernet *‘cornmand-only’‘ port using 
TCP sockets. The read/write commands specify that the disk array Is to transfer the data 
to/frora the client machines using the HlPPl “data-only" port. The disk array wUl not accept 
commands on its HlPPl data-only port, so access can only be through the disk server. The dis'' 
server will provide device management and storage management capabilities for the disk array 
and will Implement a data transfer protocol with the client machine. 

The same approach will be used for HlPPl-attached tape devices when they become available. 
A workstation-based tape server will be associated with a HlPPl-attached tape device to 
implement the tape storage systems shown in the HPDS model 

The file server component of the HPDS model will implement user interface and file 
management capabilities that are distributed o'*, multiple workstations. 


312 






JPD8 laiplemwatatton 


Implementation of the HPDS. as shown in Figure 2. is underway. The approach is to 
ta4>lemcnt a series of prototypes that will provide improving o«nabilltles for client machines in 
a timely manner. This approach will better allow new techr ^ ^*es to be used as they becmne 
available and for experience to be gained and used more eL ^ct*vefy. Work has started on the 
file server and disk server prototypes. The various server components of the HPDS wlJ be 
distributed on multiple workstations and will employ message conunuiUcatloti using TCP 
sockets. 


General Ethernet 



HlPPISwhv.. 


Figure 2. Kigh-Perform?--».e Data System Implementation. 


The file server will consist of user interface servers, name servers, and a location server. In the 
initial protoiype. an interface cdled the Data Transfer Tool (DT Tool) will be implemented to 
transfer files or parts of files between a client machine and the HPDS. DT Tool will be 
implemented as a command i. iterface on the client machines and as a DT Fool server on a file 
server workstation. DT Tool functionality, operation, and syntax will be slrnllrr to FTP. The 
function of the name servers is to map the user path names for files to a file identifier that is 
used to access the data from a storage s>’stem. The initial name server will provide a UNIX file 
structure and UNIX file manr>gement capabilities. The location server will map the file 
identifier to the storage system(s) that currently stores the file. The location server provides for 
the initial placement of flies and for the subsequent migrailon/caching of flies between different 
storage systems. Future user Interfaces might Include an implementation of an NFS-like 
transparent interface to HPDS files and a Metadata Tool that would allow users to build 
metadata files that describe and provide structured access (o the data. 


313 





The principal fimctlons of the disk server are to pnr.'tde storage and device management for the 
disk arrsy and to provide crmtrvd for the data trsmsfer process. A dedicated workstation wiU be 
used for the disk server, vdilch will provkle the view of logical storage spaces with requests to 
create/delete storage spaces, store/retrieve data in the storage spaces, query/modify attribute 
information, and status/abort requests. The disk server maps the logical storage spaces to the 
physical stora^ of the disk arrs^. 

Gnce the disk array storage system has been Implemented and evaluated, a tape storage 
system will be implemvnted. HIPPI-attached tape devices are not available now but may 
Decome available before the end of 1992. An Ampex DD-2 helical scan recorder may be 
acquired for evaluation purposes and would be equipped with a HIPPl attachment when It 
became available. Possible use of HIPPI-attached DD-1 helical scan recorders and HIPPI- 
attached IBM 3490 tape devices is also being examined. 

For a client machine to use the HPDS direct^, the machine must have a HIPn cruinectlon and 
must Install special user interface and data transfer software. A ‘Gateway Machine' will be 
implemented to allow HPDS data store and data retrieval for machines that do not have a HIPPl 
cormectlon or for machines where it is not practical/deslrable to install the special software. 
The Gateway Machine will cache HPDb d?ta and allow it to be accessed using standard 
protocols (Le.. FTP. NFS. AFS) over Ethernet and rDDI networks. 

The IEEE Mass Storage System Reference Model and the emerging standards for the IEEE 
Model were used in the design of the HPDS to take advantage the IEEE Model knowledge 
base, to make the HPDS more understandable to others, and to allow future 
hardware/software systems based on IEEE Model standards to be used. The HPDS file server 
Implements the IEEE Model name server, location server, bltfile server, and migration 
functionality, while the HPDS disk server implements the IEEE Model storage server and bitflle 
mover functionality. The IEEE Model system management functions of storage rnsnsgciuerii. 
operations, systems maintenance, and administrative control will be implemented. 


Early Experiences 

An early prototype disk array storage system was implemented by connect'rig an IBM RS/6000 
workstation to the Ethernet command-onfy port of an IBM 9375 disk array. The IBM disk 
arr .y consists of 16 data disks that provide a storage capacity of 23.3 gigabytes and a 
maximum data transfer rate of 55 megabytes per second. Storage management and device 
management software was implemented on the workstation. 

As shown In Figure 3. tht protot)rpe disk array storage ^stern was connected to the Los 
Alamos Advanced Computing Laboratory HIPPl network, which allowed the disk array storage 
system to have HIPPl connections with a Thinking Machines CM-2. a CRAY Y-MP. an IBM 
3090. and a high-resolution HIPPl frame buffer. 


314 




IBM3090-300E 


Figure 3. Prototype Disk Array Storage System. 


A high-speed Data Transfer Prof.ocol (DlVj that ailc'.vs UNDC-based ^tems to transfer data 
over a KIFFI ccrp<*etk)n was Implemented. DTP Is based cn the separation of control and data 
where control uses TCP socket i.-ur^ncct’^ns. and “"raw’ data (data without headers) Is 
transmitted over a HIPPI connection. This sepaiati''n allows for reliable delivery of control 
messages, while simultaneously allowing large blocks cl data to be transfeired over the HIPPI 
with minimum overhead. DTI’ assumes mat HiPPI eiror checkup will detect essentially all 
data errors and that the erro" rate is low. so large data blocks (l.e.. megabytes) can be used. 
The protocol provides flow control, block-level retransmission, and timeouts. Data transfer can 
consist of whole flies, parts of flies, or appending to the end of a file and can be ubtiated by 
either the sender or .Tcelver. DTP is viewed as a temporary solution because the goal is to use 
TCP sockets lor the HIPPI data connections eventually. 

DTP protocol was Implemented on the disk server of the prototype disk array storage system 
and on the client machines. Files can be transferred between the disk array storage system 
and the Cormectlon Machine (CM-2) Data Vault at 21 megabytes per second fllmlied by the 
speed of the DataVaul*), the CRAY Y-MP disk at 16 megabytes per second (limiiea by the speed 
of the CRAY disk), and the IBM 3090 expanded memory at 40 megab 3 des per second. 

To transmit visualization data from the disk array to the HIPPI frame bufler. the workstation 
issues a command to the disk array to write to the frame buffer. Files are transferred from the 
disk array to the frame buffer at 60 megabytes per second, which Is approaching the maximum 
transfer rate of the IBM disk anay. This drives the frame buffer at 12 frames per second. 


3:5 









At these transfer rates. U is possible to transfer a two-gigabyte vteualteaUon Hie &xnn the CM-2 
or CRAY Y-MP to the disk array in less than two minutes and then display the file on the frame 
buffer in 3G seconds. 


COBChlSiOIUI 

The HPDS is aimed specifically at meeting the requirements of a high-peifonnance computing 
environment of massively parallel machines, laige-memoiy supeicfxnputers. cooperative 
processing networks. higb-perfonx>ance visualization systems, and hi^-^ieed networte. With 
the cunent avallablUty of networking compmients and ilsk array devices that operate at 100 
megabytes per second and 1 .le expected availability of high-speed, large capacity tape devices. 
It is now feasible to implement a HPDS for production us<r. 

The implementation of a prototype disk array storage system has demonstrated that 
workstations can be used to control the high-speed transmission of data over a HIPPl network 
between client machines and HIPPl-attached storage devices. 


Copyright. 1992. The Regents of the University of California. This document was produced 
under a U.S. Government contract fW -7405-ENG-36) by the Los Alamos National Laboratory, 
which is operated by the University of California for the U.S. Department of Energy. The U.S. 
Government is licensed to use. reproduce, and distribute this document. Permission is granted 
to the public to copy and use this document without charge, provided that this notice and any 
statement of authorship are reproduced on all copies. Neither the Government nor the 
University makes any warranty, express or implied, or assumes any liability or responsibility 
for the use of this document. 

All Los Alamos computers, computing systems, and their associated communications systems 
are to be used only for official business. The Computing and Communications Division and the 
Operational Security/Safeguards Division have the responsibility and the authority to 
periodically audit users' files. 


316 



N93-S0479 


OTTimZIMG NGITAL 8M1I IHUVB PERFORMANCE 


Overview 


I /~ '' t 0 

/ O / ' -< 

f. 


The enpeilence (tf attachli^ aver 3U0.000 dlglta! 8mm drives to 85-plus system platforms has 
uncovered many factors which can reduce cartridge capacity or drive throughput, reduce 
rellablUty. affect cartridge archivabiltty and actually shorten drive life. Some are unique to 
an installation. Others result from how the ^tcm is set up to talk to the drive. Many stem 
orom how applications use the drive, the work load that's present, the kind of media used and. 
very Important, the kind of cleanirig program in place. 

Digital 8nun drives record data at densities: that rival Utose of disk technology. E^en with 
techixdogy this advanced, th^ are extremely rooust and. given proper usage, care and media, 
should reward the user with a long productive life. The 8mm drive will give its best 
performance using high-quality "data grade" media. E^ven though it costs more, good "data 
grade" media can sustain the reliability and rigorous needs of a data storage environment and. 
with proper cate, give users an archtv^ life of 30 years or more. 

Various factors, taken indivldualty. m^ not necessarily produce performance or reliability 
problems. Taken in combination, their effects can compound, resulting in rapid reductions in 
a drive's serviceable life, cartridge capacity or drive performance. The key to managing media 
is determining the importance one places upon their recorded data and. subsequently, setting 
media usage guidelines that can deliver data reliability. *rhis paper explores various options 
one can implement to optimize digital 8rmi drive performance. 


A Digital 8mm End User Perspective 

We generally can classify a majority of user problems to Just one of two areas — either 
rellabUity or performance. 

The first, reliability, relates to the mechanical failure rate of a particular tape drive. It may 
surprise some people but a significant majority of the drives received In repair are not realty 
broken. 'They are sulTering from a lack of proper care and/or poor media management. *rhe 
good news is that these types of failures can be reduced. Also, with proper evaluation and 
tricking, tystem integrators can plan adequate service loads, costs and charges. 

From a performance standpoint, users either have drives that are not running at transfer speed 
or writing as much data per cartridge as expected. Here too. both can be addressed and drive 
performance optimized. 


Reliability 

'Typical of the industry, a reliability specification for computer hardware is based on a 
statistically distributed mean-time-between-fa<!v:r? Half the product population Is 

expected to exceed the design speclilcatlon w!ille the other half will not. *rhe entire population 
Is normally represented by a bell curve. Some small percentage Is expected to fail very early 
whereas an almost equal number wriU work forever. It's simply a rule of statistics. 


317 



All equipment manufacturers measure product reliability by MTBF — the average time between 
hardware failures that require some form of repair. When a product is new. the useful lives of 
each of Its components, both electrical and mechanical, are added formula to reach a cteslgn 
goal MTBF a number that usually becomes the product specification. As the product is 
Improved, this number increases. Early 8mm drh^ were ^pned with orJy a 2d.0Q0*hour 
MTBF and current drives are shipping with a 40.000-hour MTBF. 

In the tape industry. MTBF is expressed as total power-on hours of operation. Total power-on 
hours (POH) can be calculated as follows: a dtfve powered on 24 hours per day (7 days per weeld 
Is powered on 720 hours each month (24 hours times 30 days). Population trend MTBF is 
calculated as follows: 

total copulation * POH/mo * n months 

; = MTBF 

total returns for n months 

But. why do you need to understand MTBF? Population MTBF based on returns can be used to 
confirm whether or not a product is meeting its design specification, living up to the user's 
erqiectatlons as defined by the manufacturer. 

Population MTBF Is calculated as follows: 


Total 8mm drives shipped 

287.903 units 

100 percent 

Acttre U.S. population 

183.980 units 

64 percent 

Total U.S. depot returns 

2.467 per month 

7.401 per quarter 

Return rate per month 

2.467 / 183.980 

- 1.34 percent 

Assumed POH/mo 

600 


MTBF 

183.980*600 / 2.467 

= 44.846 hours 


Gh’cn our In-house repair activity, we track our population MTBF on a quarterly basis (sec 
chart). Results have shown that we have exceeded 40.000 hours since the beginning of 1990 — a 
respectable track record for tape technology. 

These numbers are affected by duty cycle because, although powered on. a tape drive is rarely in 
motion 100 percent of the time — reading and/or writing to tape. Duty cycle is the percentage of 
time that the drive is in mechanical motion. 


Because individual applications vary widely, a typical application is assumed for specification 
purposes. Based on customer input r^ardlng average application use in a cross-section of each 
customer's user-base emironments. Exabyte was able to define a standard application as 600 
power-on hours per month (24 hours per day. 20 days per month) with an accumulated 60 tape- 
motion hours per month (a 10-percent duty cycle). 

What does this mean to the user? A user can roughly estimate how many units will be returned 
for service If the drives are performing as designed (their specified MTBF). 


avg. POH/mo/unit 
MTBF 


X 100 = percent failures per unit/mo 


318 



Planning must take duty cycle into account. A drive's life is impacted by the percentage of its 
power-on time that it*s actually in motion reading and/or wrltl^. If a drive is powered on 24 
hours a dsy but only reads and writes 2.4 hours per day, its dufy cycle is 10 percent (24 hours 
divided by 2.4 hours} the standard application. But if it's reading and writing 4.8 hours per 
dsQr. it's operating at twice as rrrar^ hours (24 hours divided by 4.8 hours] or a 20-percent duty 
cycle. Most likely it will need repair within half the amount of time. 


avg. no. FOH/mo/unit 


avg. no. of tape-motion hours/mo/unit 


= avg. percent duly cycle 


The "8mm Drive Return Rate" chart shows the average percentage of tape drives that may 
require repair on a monthly basis given various duty cycles and an MTBF of 40,000 hours. 


Ferf og m ancc 

There are a variety of application factors that impact an 8mm drive's operation. To some 
extent, some of these factors arc unavoidable. The goal or objective is to minimize their 
impact. In brief, the factors that most impact performanct are the application, the type of 
media belru! used, how the media's being used, the operating environment and whether or not 
the drive is being kept clean. Changing some factors will affect changes in both performarx:e 
and reliability. 

AppUcatton and System Factors 

It's fairly obvious that, to improve performance and reliability, one of the first areas to 
investigate is unnecessaiy duty cycle. If the 8mm drive can sustain a 500 kUobyte-per-second 
data transfer rate and. as such, take about 33 minutes to store 1.000 megabytes ^ data, it's 
streaming continuous^ and operating at peak throughput. If however the drive takes over an 
hour to perform the same job. it's taking twice as long and has doubled the duty cycle — maybe 
urmecessarily. 

The first place to investigate is whether the ^tem is maintaining sufllcient data flow to the 
drive. For example, if the drive 's attached to r local area network, is the drive being utilized 
during those time periods when network load is normally reduced? Keeping track of how much 
time it takes for the drive to operate on a test case data file is an excellent indication of 
network load impact. 

Is the tape drive sharing the bus with very busy disk drives? Heavy disk demand for bus access 
can dramatically affect the host's ability to keep the tape drive streaming. 

Is the system transmitting really small blocks of data? If it is. just the amount of interface 
overhead itself is going to reduce streaming performance. It would be like depositing $100. one 
permy at a time. For the 2.5-glgabyte 8mm drive, the EIXB-8200. block sizes smaller than or not 
multiples of 1 K have an Impact on performance and capacity. This is not as significant a 
problem for the 5-glgabyte drive, the EXB-8500, which is capable of packing variable block 
sizes. The rule of thumb is to select the largest practical bluCK size to improve bus utilization 
and drive performance. 

Was the system software driver originally designed for a start/stop tape device? If it was. it 
may not be optimized to stream the multiple-gigabyte 8mm drives and may be forcing the drive 
into excessive start/stop motion at an expense to throughput. 


319 



There are many legitimate reasons for adapting existing ^tem software tape drivers to an 
8mm device. Time to market. Installed base, service systems, training and a host of other 
reasons come into play. However, doing so may result In real Inefllclencles In drive 
performance and reduced media reliability. 

For example, a software driver for a small-capacity serpentine-type might return to the 
beginning of tape to update a directory after each file or fUe sub-directory has been written. 
This woidd cause the 8mm drive to shuttle hundreds of passes up and down the tape due to Its 
very large storage capacity and the requirement of fuU serial access along the tape for each 
update. This activity adds unnecessary passes to the tape and mr»e head/tape contact time 
than would be required in streaming mode. Onty a small percentage of time Is spent in useful 
data transfer. From the user's perspective, this could appear as an apparent earty tape life 
failure. In addition, the drive's read/write head life be reduced to less than it otherwise 
should be. 

Adapted drivers should be evaluated for lorig-term viability. Where appropriate, drivers 
should be modified to make better use of available 8mm capabilities. In cases like this 
example, simpty keeping the tape directory on the system until the tape data transfer operation 
is complete can result in a marired improvement In media and drive life. Total application run 
time will also be reduced. The opportunity to use h^-speed search features may also be lost. 

Media 

Use of the right kind of media and proper media care can afiect 8mm drive performance Just as 
much as other factors. Some media types also affect data reliability and cartridge capacity. 

Several grades of 8mm media are being supplied to the data processing industry by media 
vendors. Their one point of commonality stems from the fact that the ma/jnetic tape is eight 
millimeters wide. B^ond that, the media can vary widely In formulation (their composition 
and structure), film thickness (the media substrate), length of media per cartridge (expressed as 
meters or minutes) and. lastly, the physical construction of the cartridge (material, how it's 
made, its resistance to contaminants and differences in recognition hole size and location). 
The bottom line is that video grade (generic) media Is optimized for video recording purposes 
and data grade media is optimized for data processing. 

The digital 8mm drive can read and write most generic 6mm metal-particle tapes although 
using low-quality media may cause a loss in tape capacity, data threugnput. long-term 
archtvabiUty and data int^rity. The drive was not designed to write or read some of the newer 
video tapes. 

Loss In capacity and data throughput ?re caused by a high rate of dropouts in poor-quality 
media. While recording data, the 8min drive performs an immediate read- after- write data 
verification and rewrites every block of unreadable data, making sure that all data is correctly 
written and readable somewhere on the tape. Of course, this process degrades performance 
throughput and eats up capacity when the drtve enc junters a significant number of media 
errors. The drive does have a cut-off point w'.iere. after too many rewrites, it will return an 
uncorrectable media error and terminate the recording process. This is done for user 
protection. If the media cartridge is bad enough to warrant this type of termination, users 
should not use it to store data. 

Recent video introduction include tapes whose lengths exceed the industiy standard 112 
meters. Their capacity is expressed as time in a variety of lengths including 135. 140 and 150 
minutes. Users will not gain capacity as a result of the additional length as digital 8mm drives 


320 



assume that the tape ts 1 12 meters which i-i the maximum. Furthermore, these tapes can have 
cartridge recognition hole patterns which the digital 8mm drives may or may not recognize. 

Hl-8 media formulations arc available as enhanced metn particle (hi-8MP). metal evaporated 
(H1-8ME) and barium ferrite (EtaFe). ihese were develops 1 for video applications and do not yet 
lend themselves well for use In 6mm data storage devices. Although they can be used in the 
digital 8mm drive, they most likely will produce higher error rates because they produce 
magnetic signal amplitudes that are different from standard metal particle. 

Exabyte has a data grade media designed specifically to lessen the degradation of the magnetic 
qualities (i.e., metal particles) of the tape after prolonged storage. Newty developed "powders" 
encapsulate and protect the metal particles used in the magnetic coating and slow degradation. 
This results in a uniform recording surface which helps ensure dependable recording and 
preservation of the stored data along with extending the tape's archlval/shelf life. According 
to accelerated test data, the data grade tape's archival shelf life is estimated to exceed 30 years 
when stored under recommended environmental conditions. 

The improved formulation also has a new binder and lubricant which house the metal 
particles, greatly improving the durability and, in turn, the reliability of the recording process. 
Tests measuring dwell performance and repeated passes in streaming mode Indicate that the 
improved data grade media can withstand up to 1,500 passes under recommended 
environmental conditions. 

A pass occurs when any given section of tape passes through the tape path under tension. A 
back-space operation, a read, a write and a forward-space operation all constitute a pass on 
that section of tape. For example, a start/stop operation involves three passes. The tape comes 
to a complete stop when data is discontinued (1st pass); because this stopping point is beyond 
the poiiit where data was discontinued, the drive must reverse and back up to the point where 
data stopped (2nd pass): and finally, the drive proceeds to write when data becomes available 
(3rd pass). Applications that require multiple searches to the same or very nearby locations 
(such as directory or label areas) quickly accumulate passes in a localized area. 

A newly developed backcoating helps prevent frictional changes associated with repeated usage 
by protecting the tape. It maintains stable perfomiance even when the tape is operated in 
complex start/stop motions. The combined backcoating and Improved media formulation 
result in a tape surface which Improves head performance of 8mm data storage sub^stems. 

Digital 8mm drives record data at densities that rival those of disk technology. Elven with 
technology this advanced, the drives are extremely robust and, giv'en proper usage, care and 
media, will reward the user with a long productive life. To reiterate, the drives perform best 
with a high-quality "data grade" media which can prolong drive head life: provide up to a 30- 
year archival life: be used for up to 1,500 passes; and deliver a cartridge shell specifically 
designed for data processing which offers a > many as 10,000 lid opening and closings. 

Media Care 

A major problem here is over use of tapes. If an end user is not familiar with the detailed 
motion characteristics of an application, excessive passes can accumulate. Mechanical loader 
and library-type applications are particularly prone to this problem. When tapes begin to 
break down from overuse, they begin to generate tape debris. This causes unnecessary wear and 
data Integrity problems which , in turn, lead to degraded read/write performance because the 
drive is forced to perform excessive error recovery routines. 


321 



Media accHmation is also important. Before using an 8mm data cartridge, allow it to 
acclimate to the operating environment for twenty-four (24) hours, or for the amount of time it 
has been exposed to dissimilar conditions — whichever is less. 

Proper storey is a must. Always store the 8mm tapes on edge. Do not stack flat Constant 
environment control is more important than absolute temperature values, so keep the 
environment constant. The closer it is kept to ideal . the better data recovery results will be. 
lt‘s also a good idea to keep a storage log on all tapes, locations, contents and hlstoiy. 

Tapes should be exercised on an average of once every twelve (12) months ly running them 
from beginning to end and back to the beginning at normal speed (not rewind or high speed). 
This operation is best performed by readirig to end-of-tape and rewinding at normal spe^. It 
will remove any stress which can build up in the tape pack during the storage interval. Tapes 
stored at higher temperatures should be exercised more frequently. 

Data Archival 

When data is to be archived for extended periods of time (several years), optional steps can be 
taken that further insure data integrity and recovery over and beyond digital 8mm*s normal 
and extensive built-in data recovery margins. The tape unit used for recording archival data 
should be exceptionally clean. It is also suggested that brand new cartridges be exercised from 
end-to-end up to four times at normal (not high) speed. This proving process will remove any 
potential debris that could have been generated during the tape manufacturing process. This 
last step is especially critical whenever the tape being used is not of high-quality, 
recommended "data grade" material. 

Archival data should always be recorded using the read-aller-write check and rewrite features 
of the 8mm drive. When recording an archival tape, the tape should be completely recorded 
from end to end without stopping. This means that the system must be set up to constantly 
stream data to the tape drive. When the end of tape is reached, rewind it at normal speed, not at 
high speed. To store, clearly label each cartridge. Include all pertinent information such as the 
model and serial number of the recording tape unit, the date, the density, any error statistics 
and a log number. Always store the tape data cartridge in an 8mm cartn^e storage container. 
An excellent practice is to further seal the tape cartridge container in a polyethylene bag for 
long-term storage. 

Environmental Factors 

If high humidity exists (greater han 45 percent), increased tape coating abraslvity occurs 
which causes increased tape drive head wear. The same applies to tape wear although to a 
lesser degree. If low humidity exists, the combination of friction, low humidity and organic 
material (contained in the magnetic coating) can cause the formation of what Is called friction 
polymers. These brown or bluish stain deposits appear on the head surface. They are very hard 
and. as they build up. increase the effective distance between tape and head, reducing signal 
strength. A desirable range for humidity is 35 to 45 percent. 

As the temperature increases, the maximum allowable relative humidity to optimize drive 
performance decreases. For instance, with an increase from 22 degrees Celsius (72" F) to 32 
degrees Celsius (90" F). the maximvm allowed relative humidity decreases from 80 percent to 
approximately 50 percent. Thus, temperature has an indirect effect on performance by 
crowding the humidity limits as temperature increases. It is an important factor to be 
considered when integrating (or operating) drives into a system configuration. 


322 



Airflow and Location 


While there are many requirements for successful Integration of an 8mm drive into a system, 
proper airflow and cooli^ are key to maximizing drive and media life. High temperatures 
reduce the humidity tolerance of tapes and can cause higher drive failure rates. To maximize 
drive life, a minimum temperature rise over ambient is desirable. This is achieved by drawing 
sufflcient airflow through the drive in order to maintain tape path temperature at or near 
room temperature. Too much airflow can cause excessive amounts of dirt and dust to be 
ingested by the drive leading to performance problems. When a balanced airflow exits, 
particulate contamination is e,*nerally not a concern In the average oflice environment. 

In addition to balancing airflow, consideration should be given to mounting location. The 
preference is to keep the drive away from a floor level or other areas where dirt can collect. 
Conversely, locating a drive on or in the top of a systems cabinet may expose it to elevated 
temperatures. 

Cleanliness 

Running drives without cleaning will result in media deposit bul’ '-up In the tape path and on 
the heads which. In turn, will increase error rates and ultimately result In drive failure. The 
rate of deposit bufld up varies wldefy and is dependent on tape qu^ty. tape usage (new. worn or 
ready to be retired), tape motion (streaming versus start/stop). tape path condition (aligned, 
clean, new or used), and the effects of temperature and humidity on the media. 

Operation without cleaning will eventually result in a significant accumulauon of deposits 
that can become burnished into the tape path, resulting in drive failures and/or unacceptable 
drive performance. When this happens, single or even multiple cleaning passes of Ehmbyte 
cleaning tape will NOT remove the deposits. Factory cleaning is necessary. Drives have been 
returned to the factory with suspected early life head failures, only to find that excessive 
deposits were the cause of faflure. 

Regular cleaning with an Exabyte Cleaning Cartridge will prevent this deposit build up. as well 
as. mamtain the tape path and rrad/% ^ heads in a clean condition. Ba.sed on test findings 
and allowing for the wide variety of applications and media, a preventive maintenance 
specification was established for 8mm data recording drives. It stipulates that one cleaning 
pass be performed at least once per month or after the transfer of approximately 30 hours of 
tape motion. In any worse-than-normal environment, cleaning should be more frequent. Tills 
specification has been validated by testing and, where properly applied, has proven successful 
throughout the field population. 

Abrasive cleaning tapes can destroy heads. Use of other than Ehcabyte-approved cleaning tapes 
can result in much-degraded head life. Instances have occurred in which all useful head life 
has been removed from drives in as few as five cleaning passes. Also, some types of abrasive 
tapes, that are typical of video cleaning cartridges, can deposit material on the read/write 
heads, rendering them immediately useless. To avoid this problem. Exab 3 rte Cleaning 
Cartridges should be made readily available to end users, along with adequate tralTilng, strong 
wamlngr and counseling. Use of unauthorized cleaning procedures can void warranty. 

When applications do not monitor drive usage and prompt users for needed cleaning activity, 
users can estimate the amounts oi data transferred over operating periods and establish 
regular cleaning intervals as appropriate. Cleaning frequency could be based on number of 
tapes processed, number of Jobs run, shifts, hours, days or weeks, etc., up to a maximum 
Interval of once per month. 


323 



Check List 


EIxabyte has developed an "Integration Check List” for its 8mm tape driven. It may help predict 
tlie potential for drive failures that can be caused by integration and application 
characteristics. All of the factors in the entire list should be evaluated for each application 
because, for the most part, a single questionable variable may have no adverse affect on 
performance and reliability. However, multiple questionable variables can have a 
compounded detrimental effect. When unacceptable conditions exist, serious reliability and 
performance problems are likely t? occur. Guidelines are as follows: U * Unacceptable, 7 « 
Qneatlonahle. O = Optimal Condition. 

Integration Factors 

Tape Path Temperature > 4 C Rise U 

<lCRIsc ? 

lto4CRise O 

Tape Drive Location Contamination High 'Near Floor or Dirt) ? 

Away from Floor O 

Application and System Factors 

Average Transfer Rate (2.5 GB drive) <= 123 KB/s ? , 

> 123 KB/S but < 246 KB/s ? 

= 246 KB/s (streaming) O 

Average Transfer Rate (5 GB drive) <= 250 KB/s ?? 

>250 KB/s but <500 KB/s ? 

= 500 KB/s (streaming) O 

Average Block Size (2.5 GB drive) < 1 K U 

1 K or multiples of I K O 

Average Block Size (5 GB drive) < 1 K ? 

le 

Number of Blr ks Transmitted Low ? 

(per SCSI Command) High O 

Number of Tape Passes per Use Unnecessary repetitiouc positioning U 

(representative of start/stop operations) >6 ? 

<=6 O 

Directory/Label Updating Frequent U 

Once per session O 

Monitor and/or Prompt for Clear ing Interval None ? 

(on operator console) Yes O 

Monitor/Prompt for Number of Tape Passes No ? 

(on operator console) Yes O 

Monitor/ Prompt for Soft Error Rates No ? 

(to prompt media replacement decision) Yes O 


324 



Medto 

Media Use "geneilc" or *Vldeo grade” ?? 

Use "Data Grade” ? 

Use EXATAPE O 

Require Cleaning as a Condition of Warranty No U 

Yes O 

Use ^proved Cleaning Cartridges No ? 

\es O 

Snppr ‘ tad Training 

P'Wlde Media Usage and No ? 

Handling Guidance Yes O 

Provide Cleaning Practices No ? 

Guidance, Documentation and Training Yes O 

Provide Clear. Strong Wan ing against No U 

Using Alternate Cleaning Tapes Yes O 

Installation Specific 

Temperature (Long Term Avg) <5 C or > 40 C U 

5Cto<16Cor>24Cto40C ? 

16to24C O 

Humidity Non-Condensing (Long remi Avg) < 20% or > 80% U 

20%to <30P&or>45%to80% ? 

30% to 45% O 

Air Conditioning Absent or Set Back Yes ? 

at Night during Drive Operation No O 

Operation in Eixcesslvely Yes U 

Contaminated Environment No O 

Recommended Cleaning with No J 

^proved Cleaning Cartridge Yes O 

Use Cleaning Cartridge beyond Yes U 

Specified Useful Life No O 

Tape Acclimatization No U 

Yes O 

Onsite and offsite media storage Uncontrolled or unknown ? 

Controlled environment within storage spec O 

Duty Cycle >3 ? 

(approximate tape motion hours per dr /) <= 3 O 

325 



Tape Mounted in Powered-On Drive 
(exposure to contaminants) 

KeU^^a Service ftacticcs 

Trr.ck Units 
(ty serial numbu) 

Track Failures 

(by type, customer and serial number) 

Trark Repatr Reports 
(by serial numl^ 

Supply All Information to Service Provider 
(dumps, tapes, uniti^ 

Centralized Support Oi]ganization 


Entire day ? 

Oafy during operation O 


No ? 

Yes O 

No U 

Yes O 

No ? 

Yes O 

No ? 

Yes O 

No ? 

Yes O 


This may be an expansive list but it is the simplest way to summarize all o£ the facUus that can 
affect digital 8mm drive perfomumce. As you may have noticed, performaixre and reliability 
are closely tied together. Digital 8rom drives are extremely robust a^. by having these various 
factors optimized, witi reward the user with a long productive life Furthermore, itigh-quality 
8mm "data grade" media, even though it costs more, can sustain the reliability and rigorous 
needs of a data storage environment and. with proper care, give users an archival life of .30 
years cr more. 


326 



327 


2.5GB 8mm Drive MTBF 

Field Ongoing Reliability 


MTBF in hours (thousands) 



1968 1989 Q190 0290 0390 0490 0191 0291 0391 

MTBF-(U.S. base X 600 POHyU.S. Returns 


MTBF based on U.S. Population 



8mm Drive Return Rate 

40,000-Hour MTBF 


Percentage Returns/Month 



+~ 10% DuN Cycle 20% Duty Cycle 30% Duty Cycle 


i/i/92 




N93-S0480 


r 




Using TVansparent Informed Prefetching (TIP) 
to Reduce File Read Latency 

RJH. Patterson, G A. Gibson, M. Satyananiyanan 
Carnegie Mellon University 






Outline 


I/O performance is lagging 

No current solution fully addresses read latency 

TIP to reduce latency 

• exploits hi|^*levd hints that d<ni*t violate modrlarity 

• ctmverts through|Hit to latency 

Preliminary TIP test results 






As pioc;;ssor performance gains coDtiiue 10 outs&ip Input/Output gains, I/O perforaumce is becom- 
ing critical to overall system performance. R]e read latency is die most significant bottlenedr. for high per- 
formance I/O. Other aspects of 1/0 performance benefit from recent advances in disk bandwidth and 
diiou^:put resulting from disk arrays [PattersonSS], and in write perfrmnarw derived from bufiered write- 
bdurxl and the Log-stnictured File System [Rosenbhim9l ]. The access gaf> problem limiting improvements 
in read latency is exacerbated by distributed file systems operating over networks with diverse bandwidth 
[SpectoiS9. SatyanarayananSS]. In dns paper, we focus on extending die power of caching and fMefotdiing 
'tduce file read latencies by exploiting hints from higfi-Ievels of a system. We describe sudi lYanqiarent 
Informed Prefetchir^, TIP, and its benefits. We argue that liints that disclose hi^ level knowledge are a 
means for transferring optimization infoimation across, without violating, module boundaries. discuss 
bow TIP can be used to convert the hi^ throughput of new technologies sudi as disk arrays and log-struc- 
tured file systems into low latency for i^ications. Our preliminary experiments show reductions in wall- 
dock execution time of 13% and 20% for a multiide module compilation tool (make) accessing data on a 
local disk and remote Coda file server, respectively, and a reduction of 30% for a text search (grep) remotely 
accessing many small files. 


329 



Latertcy 


Throughput 


Read 


demand caching 
prefetching 


disk arrays 


Write 


buffered writes 


disk arrays 
buffered writes 
LFS 


But, cache efTectiveness is declining 


This table shows the mcc h i n i sinii inoaheavfly laed to combat Ifae growing VO bottleneck. Written 
daa bewfils from write-behiiid buffering aad k)g-4tnictiued fiks systons. I/O dmwgiipiit is direcdy 
increased by panUelism in diric anays. Read Istency. howcyer. is only reduced by caching nd prefeicfaing. 
As will be shown next, caches will not, by tbenudves, be able to relieve the I/O bookneck. and prefeichiiig 
win emeiBe as a critical apfxoadi to de problem. 


330 












Effective 1/0 Performance with Caching 


r 




Ti/o = *«V-U-»*)Ch“ >*Cii 

T, = Tc + H*Ti/o= *c + Ma>K|| 


— I/O tunc 
C|| = costofamiss 
T]j = execution time 
= namber of I/Os 


M = cache miss ratio 
Cg = cost of a hit 
Tq = computation time 


Miss ratio for effective 1/0 perfcmnance to scak widi CPU perfoimance 


CPU/IO Perf. 

Current-1 

10 

100 

Miss Ratio 

40% 

4% 

0.4% 


V 




Caches reduce the avenge I/O service time by reducing number of IA3 requests that must be ser- 
viced by slow peripbeial devices. Tbe ratio of ret^iests thus serviced to die total mimberof requests is die 
miss ratio. For caches to compensate for the growing gap between CX*U perfonnance and I/O peiipheial per- 
fbimance, diey must reduce tbdr miss ratios. This simple model quantifies this relatianship. 

Tbe average 1/0 service time. T^q. is the weighted sum of the service times for requests that miss 
in the cache and must be serviced by die I/O subsystem. Cm, and for recpiests that hit in die cadie, Q{. The 
cadie miss ratio, M. weijibts the sitm. Snee « C^. dm avenge I/O service time is roughly MQf. The 
execution time fiM- a program, T£. is die sum of the time spent on computation, T^. and the total time spent 
on I/O. Time ^)ent on I/O is, in turn, die product of the mimber of VO requests. N^. and the avenge time 
to service a request As processor improvements reduce Tc relative to CMf the miss ratio, M. must be 
reduced to adiieve cone^xNiding leductious in die time qxnt on I/O. Tbe table shows the improvement 
needed in the cadie miss ratio for the effective I/O perfoimance tt> keqi pace with processor gains. A cache 
that currently has a 40% miss ratio must improve to 4% to match a ten-fold increase in processor perfor- 
mance and to 0.4% to matdi die 100 fold increase expected in die next ten to fifteen years. As die next sll^,e 
shows, such miss ratios are most unlikely. 


331 













The mimbefs in this tabfe are dnwn fifom [Ousteiliout85] tod [Bakei91]. The 1985 tracing study of 
the UNIX 4.2 BSD file system predicted cache perfofinnoe for a range of cacoe sizes assuming a 30 second 
flush bade policy for writes. The 1991 study measured cadre perfotmanoe on a number workstations run- 
ningSpritB.Th Sprite cadre sire varied dyriamically, but avwagedTMBytes. The dimiiiidiuigreturrrs from 
increasing cadre sire are evident in the 1985 results. Also striking is die diffierence b^ween die predicted 
and measured performance of a large cache. The large cache was not nearly as effective as expected. The 
authors of the study conduded that growing file sizes were to blame for the disr^ipoirtting cadre pnfor- 
mance. This result is strong evidence that we cannot rely on increased cadre sizes to give us the extremdy 
low miss ratios needed to improve effective I/O performance. This leaves us widi prefetching as a tool for 
improving 1A> read latency. 


332 













r 


Transparent Informed Prefetching (TIP) 




1) Encapsulate programmer knowledge about 
future I/O requests in a hint 

2) IVansfer hint to file system 

3) File system uses hints to transparently 
prefetch data and manage resources 


Pirefetdong (»D pie-kMd the cadie to lednoe the cacbe miss ntk). oi; «t least reduce the cost of a 
cadie miss by staiting die VO eaily and dieieby improve effective lA) perfonnanoe. While there have been 
a number of tf^proacbes to prefetdung lKotz91. Smidi85. KfcKusidc84, Feieita^, it is (rflen difficult to 
know what to jnd prefetdnng incoirecdy can end up hurting peifonnance [Smidi85]. 

lb be <l-. prefetdung should be based oaknowledge of hiture I/O accesses not infer- 
ences. We cU5<c s>'.:nkr<'r is often available at hi^ levels of the system. Programmers could give 

hints about their %coi^sses to die file system. Tbus informed, the file system could transpaiendy 

I»efetch rieeded d rite and > resource udlizatioa Wfe call this iren^Nuent Informed Prefetdiirig CnP). 


3 ^ 



r 


Obtaining Hints 


Early knowledge of serial file access 
Access patterns part of code algorithm 

• large matrix supercomputing: read by row, 
read by column 


Hints generated by: programmer, compiler, 
profiler 


CMtical to the success of infbnned prefetctung is the availability of accurate and timely hints. An 
important pan of our research ^ be to expose such hints in important, l/0-dq)endent apfdicadons. How- 
ever, we don’t think diis will be as hard as it might seem. After all, the success of sequential readahead is 
laigely the product of “discovering” that an apfdicaiion is sequentially accessing its files; this is really 
known a [xiori because a ixogrammer has dmsen to do so. Often, it is known wdl in advance that many 
files will be thus accessed. It is a simple step to have {xogrammers notify the lA) system, throu^ a hint, of 
sequential access patterns. 

In addition to the sim{dest hints about sequential accesses, programmers could give hints about 
more comidex, non-sequenfial access patterns. An important beneficiary of this I^)pn>ach will be the laige 
scientific programs diat execute alternating row and column access patterns on huge matrix data files 
[Millei91 ]. At least one of these access patterns will not be sequendal in the file’s linear storage, yet the pat- 
tern is easily and obviously specified by a programmer 

In addition to programmer-generated hints, crxnpilers could automatically generate hints, or a pro- 
filer could be used to generate hints for future runs of a program. 


334 



r 


AppUotion Examples 




grep foo ♦ 

• Shell expands to a list of filenames. 

• Grep searches for a string, *foo,* in all the files in 

the list 

• From invocation, it is known that all of the files on 

the list will be read sequentially. 

• Give a hint about all of the files at once. 

make 

« makefile specifies all files to be touched from the 
start 

• make generates hints fw binaries it will invoke and 

the files they will touch. 






While we believe that scientific api^cations will be major beneficiaries of TIP, cranmon Unix {pli- 
cations can also benefit Her are two examines. 

Given the command ‘giep foo the shell e^qiands the *** into a list of all files in the current direc- 
tory and invokes the program which searches for the string *foo* in all fire files. Grq>, or even the 
shell if it knows a little about grep from a command registry, can issue a bint notifying a TIP system that all 
the files in the list will soon be read. If the system has stored these files on an underatilized disk array, many 
or all will be fetched concurrently. 

We expect programs issuing hints on behalf of other programs, such as the shell on behalf of gr^, 
to be a common occurrence. Another examine is the ‘make’ program which orchestnites the compilation of 
program modules and their linking with standard libraries. ‘Make’ determiires its acUtms according to a 
‘makefile’ of instructions. After parsing a ‘makefile’ and drecking the status of all modules to be built, 
‘make’ constructs a set of conunand sequences that it will pass to a shell for execution. These commands or 
the shell itself can issue hints about their VO accesses. Pursuing a TIP approach more aggressively, ‘make’ 
can use the same command registry as the shell to issue hints even before it issues the corrunands. 


335 



TIP Converts High Throughput to Low Latency 


Use excess storage bandwidth to pre-load 
caches with future accesses and overlap I/O 
with computation 


Expose concurrency to pack low-priority queue 
with prefetch requests 

• Optimize seek scheduling 

• High-throughput disk arrays simultaneously 

service multiple requests 

• Multiple network requests may be batched 

togetho* 

Cache management superior to LRU 


Aimed ^th knowledge of future fi. j a^xesses, a system emi^yingTIP can improve peifonnance 
in three important ways. 

1) At the most basic level, HP. as for all prefetching, can oveilq) slow yo accesses with other use- 
ful woik so that applications ^lend less time idly waiting for these accesses to comfdete. But. because TIP 
systems know what to prefetch, they can prefetch more aggressively to pre-^oad the cache with future 
accesses and further reduce cache misses. 

2) Using TIP, normally short I/O queues can be filled with low-priority prefetdi requests giving 

more oppoitunities for low-level I/O optimizations. For an individual disk, deeper queues allow better aim 
and rotation scheduling [Seltzei90]. For a disk array, deeper queues mean more requests are avail 
concurrent servicing by independent disks. On a network, prefetdi requests can be batdied toge^ >«. 

ing network and protocol processing overhead. 

3) TIP improves cache managemenr ‘o further reduce cache miss ratios. If it is known what a,a vrill 
be needed in the future, it may be possible to ouqierfotm an LRU page rei^acement algorithm, even without 
prefetching. Unneeded blocks can be released early, and needed Mocks can be held longer. 

The first two benefits make TIP an excellent mechanism for exi^idng the high thiou^iput of 
emerging storage technMogy to provide the low latency that these iedmologies cannot provide. Combined 
with improved cache management, these dircc benefits make TIP a powerful tool for overcoming the wid- 
ening access gap. 


336 



Hints are Disclosure not Advice 


Hints that disclose 

Hints that advise 

1 wilt read file F sequentially with 
stride S 

1 will read these 50 files serially & 
sequentially 

cache file F 

reserve B buffers & do not read- 
ahead 


• Users not qualified to give advice 

• Advice not portable, disclosure is 

• Disclosure allows more flexibility 

• Disclosure supports global optimizations 

• Disclosure hints consistent with sound SWE 

principles 


As the previous slide showed, TIP is much more tiian simple prefetching; it is a strategy for opti- 
mizing I/O. For £ number of reasons, sudi powerful optimizatic.)s dqiend on having hints diat disclose 
knowledge of future I/O operations instead of hints diat give advice about I/O subsystmn openttion. 

Advice about low-level operations depends on detailed system-specific knowledge. Even if a user 
had such knowledge of a system’s static configuration, they could not know about the system’s dynamic 
state. Thus, the user is not qualified to give advice on how to optimize the dynamic operation of the system. 
Furthermore, such system-^tecific knowledge would no* apply to ofiier systems, and so, advice that exploits 
it would not be poitable to other systems. 

Additionally, hirUs that advise, such as, ’ e this fi'e,’ do not give much usaUe knowledge to the 

TIP system. What should dK TO* system do if it cat. .ot cxhs the whole thing? Should it cache a part of the 
file? Which part? If, instead, the applictuion discloses how it will access the file, the TIP system has the flex- 
ibility to resptmd appropriately. This flexibility is crucial for balancing competing demands for global 
resources. 

Good hints that disclose are ^cified using the same semantics that an application later uses to 
demand access to its files, whereas bad hints which advise concern fiiemselves with a system’s implemen- 
tation. It is not a coincidence that good hints are compatible with modular software design. They are a means 
for transferring optimization information across module boundaries widiout violating those boundaries. 


337 








Our research into aTIP af^roach began with simple. controUv ! experiments demonstrating die 
potential benefit-' and obstacles of infonned prefetching. Our goals with diese experiments were to validate 
TIP as a tool for reducing read latency, determine if mere than a simple, user-level mechanism is needed, 
uncover impl^entadon problems, and develop experimice incoiporadng hints into triplications. 

We used two hardware platfonns for our tests. The local disk tests were conducted on a Sun Spare- 
station 2 running Mach 2.3/BSD Unix 4.3. The remote tests were run on two Decstadem 5000^00 ninning 
Mach 2.S, one of diem the client, and the other the server for the Coda diinibuted file system [Satyanaray- 
anan90]. 

We tested the two applications previously mentioned, shell expansion of **' for ‘grep,* and ‘make’ 
building a program called ‘xcalc.* Flow charts for the two test programs are ^ve. The chart on die left 
shows the configuration for exploiting shell expansion of A fork operation splits the program into two 
processes. The command runs down the left side of die fork, while an independent prefetch process runs 
down die right side of the fork. The prefetch process uses the expanded list of filenames to determine what 
to prefetch. The right-hand chart shows the configuration for the ‘make’ example. It is similar to the previ- 
ous example except diat a tracing facility [Mummert92] is used to determine in advance the files to prefetch. 

To prefetch from the local disk, the prefetch process ^ply read the )^)propriate files, indireedy 
causing the data to be moved into the cadie. 1b prefetch remotely from Coda, the prefetch process used a 
special prefetch iocd to expliddy and asyndironously transfer the file to the local machine. 


338 










Results 


1 


Lc^Diik 


Dteftuiid Fik SyoBB (Coda) 

1 Applimiott 

hoc 

dchc 

cold 

C9Chc 

cold cache 

% 

hoc 

cold 

cold cache 
Wi^refeldi 

% 

w.^cselcn 

redoc^Hn 

cache 


ladBction 

SDAkexc^ 

9.1? 

(0.03) 

14.19 

<0-13) 

1^40 

(OjOT) 

llJb 

18J9 

aoo) 

4a4i 

om 

njo 

a74) 

20J 

gicp&ote* 

1.22 

«O.Oii 

3-29 

(0.13) 

3J0 

(Ol(H) 

0 

i.ts 

(0.01) 

7J6 

(0.77) 

3J3 

(0.68) 

29.4 


• make xcalc: compile ^ link T. indow calcu- 

lator 

• grep foo ♦: 58 files, 1 MB 

• Results limited by lack of parallelism in I/O 

subsystem 






This tabic compaTK the elapsed tunes to run tvra appUcatioos with and witbout pf 'fetcbing on both 
(he local disk and ibc Coda distributed file system. The first applicatkm, ‘make xcalc,’ compiks and builds 
theX windv.«v calculator tool The second, ’grepfoobar*,’ searches S8 files containing a total of 1 MByte 
all stored in (the cache oO a remote Coda file server; 

The numbers in parentiicses are the standard deviations for the mersuremetits. Siixx the local tests 
were performed on a Sun Sparcstadon 2 whereas the Coda tests were penormed on Decstadon S(X)(l/200. 
the numbers are net directly comparable. In the ‘bo; cache’ runs, ail dau read throughout the job were in 
the local oufTer cache, so the job never blocked for the disk. These numbers rqxcsent a lower bound im the 
elapsed time. At the start of the ’cold cache’ runs, there was no dau in the buffer cache or client disk cache, 
though, in the distributed case, th- server’s buffer cache was not cleared oetween runs. The ‘edd cache w/ 
prefetching’ runs were started just like the ‘cold cache' runs, but they >ised prefeiching to speed access to 
the files. The ’% reduedon’ represents ’ihe benefits of prefetebing. 

TIP systems will CHily be ' .e to ;^)proach the lower bound represemed by the 'hi 'ache' nurr bets 
when combined with high-uuoug utl^ subsystems unavau for these tests. In the grep test the local 

disk, the execudon dr ; is domiiuted by lA). The oisk is > v running flat out, so there is no time for 

prefetching. Grep with a disk array would sdll keep one disk busy and wo-.>M run in about the same ameumt 
of tiine, but grep with TIP and a disk array would keep many disks brsy. IV '^.'1 time spent on lA) would 
drop and performance approaching the 'hot caeV lower bcuial should be possible. 


339 




















r 


Lessons from Tests 


• Independent prefetch process overhead too 

high 

• Single prefetch process => no deep prefetch 

queues 

• Coda ioctl allowed too much prefetching 

> Uiread sUrvati4Mi • need low-priwilj prefetdimg 

> premature cadie fludting - seed to trade consanq>tion 

• Poor cache buffer replacement performance 

• Disk write scheduling often very isefGcient 


our exi>eniDents were pidiminaiy. Uiey served ilKi.*piiipo% of ileiDOOsttating the benefits 
of infonoed prefetdung and educating us about impieineiiatioo pitfaUs. 

Using independent prefetch processes incurred a lot of extra oveihead, e^)eciaUy in die local didi 
tests. Context switching, process scheduling inefficiencies, ^^aem caU cost. and. on die local disk, dau copy 
costs all reduced die perfbnnxice of the prefetch tests. But. t** : most serious hindrance to prefetetung from 
the Icoal disk was diat. because die read sytoeni calls used are blocking, there was never more than one 
prefetch request in the queue u a tune. Tluis. we did not benefit feom the sdieduling advantages offered by 
deeper queues 

The coda tests avoided this proUem with the asynchronous prefetch iocd. They suffered instead 
from over-prefetching. Untfl we reduced die priority of the prefetches, they interfered with demand fetches, 
reducing peifoncance. Also, orefetdies sometimes got ahead of the actual job and caused prefetched dau 
diat had not yet been used to be rqilaced in the cache by newly prefetdied data. Qeaily. a real system will 
need to track data consumption to avoid this pioUem. This was an exueme example of the cache manager 
making uninfonned decisionc. The cac^ held onto data that had just been used in preference to prefetched 
data duU was about to be usei.. Integrating TIP with the cache manager should gready improve perfomance. 
In the tests, we avoided this prob’em by using a very large cache that could all of the data. 

Writes of whcle blocks were not buffered and thus were interleaved with bndi prefetdi and demand 
reads which led to very poor disk scheduling. This higfiligl'ted the importance of buffered writes. 


340 



Summary 

TIP uses hints to convert high throughput stor- 
age to low latency where caching fails 

Hints that disclose, not advise, provide the best 
in^rmation and are consistent with sound 
SWE principles. 

Applicable to local disk and network fite serv- 
ers 

Immediate Plans 

• modify Coda/BSD/Mach to accept and exploit 

correct hints 

• find & instrument applications 

> make, seardi, vissiaiiatkm, simulatimi 


Transparent lofbimed Piefeicbing. TIP. exten(b tbe power cadnng and prefddmig to rednoe 

both local ind remote &k read latency by exploiting application-level knowledge of future acxess paoenis. 
TIP systems can txx)peme with resouice anagenieupolicks to increaK the utilization and efficiency of 
high-dmKighput nctworit and storage systems. Many future accesses become current accesses that can 
expkMt the parallelism of disk arrays or may batched to reduce networtc ovstheads. Didr accesses and buffer 
allocation may be improved with fotdenoa^edge of future accesses. TIP dfectively conveitt the tu^ 
throu^ipitt of new peripheral technofa>gy into low read latency for applicatiOQ programs. 

Informed prefetching depends on hints horn appircadons that disdosc their future I/O accesses in 
terms of opeiaikuis on files. Hints should not give advice about I/O subsystem operation nor be expressed 
in terms of resource management policy options. This distinction is imprxtant for bint portability and coc- 
sistoKy with software engineering prindi^es of modularity, and for the TIP system to be able to effectively 
manage global resources. 

Preliminary tests have oonfinned die potential benefia of informed prefotdiiiig and highlighted 
some of die potential pitfal-* of implemerttation. 

Our next stq) is L .jpiementTIPinaCoda/BSCVMachoperat)ngsysiem.Tl]enwe will identify 
and instrument apfdications to provide die required hints to the sysrem. 


341 



Refermas 


LBdcezdl] 

[PeurtacTl] 

[Kotz911 

[Me&skk84] 

IMinerSl] 

QiiS«91b] 

(Mammert92i 

COusterhoatSS] 

[Pstters(Mi88] 

[BosenblumSl] 

[SatyanaragrananSS] 

[Satyananoranan90] 

[SeltzerSO] 

[SmithSSl 

[Spector89] 


Bakar, 1C.&. Haatman, JJL, KipAi; MJ)^ ShirriS^ aud Oartarhoot, 
J.K, "ifeaaareircnta of a Dutardmtad File Sjratam,” Aoc. of the iSck Symp. on 
0|xrafuv SjratcMPKadjpte, Pacific Grove. CA, October 1991. pf>. 198-212. 

Fdffrtag. R. J., Otyanidk, £. L, *Tlia MnH k* bipatAOatpat Byatent,* Pne. of 
the 3rd Symp. on Operating Syotem Prindplittt VS71,pp 35-^1. 

Kota, D., EDia, C.S.. ‘Tractkal PnfittdiinK IbciiiUQuaB for Para&d File Sya- 
tann^ Proc-FirttbU^ Conf. on PutaUid, and DutriSnstedhtfonnatianSyetenu, 
Miami Beach. Florida. Dae. 4-6, 1991, pp. 182-189. 

McEuiek. M. K.. J<q; W. J.. Lefflar & J.. FMvt. R S.. *A Fkat Fita Systam for 
UNIX,”AC3f2hi.'sa. oaCai^ii<er^patema.V2C3),AiigaBtl964,pp. 181-197. 

Idler. *lnpatX>nlpat Bdkarior of Sopereompataeg Ai^Ikationfl,* Univar- 

aity of Cakfiama Ttedinkal Rqwrt UCB/CSD 91/616. January 1991, Master's 
Thesis. 

Idler. Ethan, private cornmnnkation, 

Mmnmert, L., Satyanarayanan, IL, "ESdant and PortabU File Reference 
TVadngin a Distributed l^rkstatkm Environment,” Carnegie Melkm Univer- 
sity, mannaeripi in pnq>eration. 

Ouaterhout, JR., Da Costa urisoo, D., Kunse, JR., Kipfer, M., and 
Thompson, J.G.,*A'nrace-Dr. .aaiysisoftl^ UNIX 42 BSD File System,” 
Proc. of the 10th Syrep- on Operating System Prindpiea, Orcas Island. VIA, 
December 1986, pp. 16-24 

Patterson, D., Gibeon, G., Eats, R, A, ”A Case for Redundant Arrays of biex- 
pensive Disks (RAID),* Proe. of the 1988 ACM Conf. on Management of Data 
(SIOfOD), (Siicago, IL, Jana 1988, pp. 109-116. 

Rosenbhjm, M., Oasterboat, JR, *The Design and Implementation of a Log- 
Structured File System,” G^jenitiiig Systems Reoiew (Proceedings of the 13th 
SOSP), Volume 25 (5), October 1991, pp 1-15. 

Satyanarayanen. B4, Howard, J. Nichols, D.. Siddsothe . . Spector, A, 
West, m, ‘The FTC Distribated File Systmn: Principles ana . ''gn,” Proceed- 
ings of tlu Tknth Symposium on Operating Systems Prindpies, aCH, Decem- 
ber 1985, pp. 35-50. 

Satyanarayanan. M., KisUer, J. J., Kumar, P., Okasaki, M. E., Siegel, E. H., 
Ster-e, D. C., ”Coda: A Highly Available Fde Syetem for a Diskibuted Work- 
station Environment,” IEEE Transactions on Computers, V (3-39 (4), April 1990. 

Seltzer, M. L. Chen, P. M., Ousterbout, J. K., ‘Diak Scheduling Revisited,” 
Proc. of the Wnter 1990 USSNIX Tedinical Conf., Washington DC, January 
1990. 

Smith, AJ., "Disk Cadie-Miss Ratio Analyns and Design (Considerations,” 
ACM Thots. on Computer Systems, V 3 (3), August 1985, pp. 161-203. 

Spector, AJZ., Kazar, M.L., “l^de Area File Service and The AFS E^>erimen- 
talS3rstem,”U>uxil«>teto,V7(3), Mard* 1989. 

*U.S. GOVERJJMENT PRINTING OFFIC-: 1993-728-150/60025 
342 



REPORT DOCDMENTATION PAGE 


term Apprptmd 
CHiBNo. 0704-0189 




t AaSNCT USE ONLY riM*» Man*; I E iwmimuTE 

I April 1993 


LURE AND SUB1TTLE 

Goddard Conference on Mass Storage Systems and Technologies 
Volume I 


CAUTHOnm 

Ben Kobler and P. C. Hariharar., Editors 


7. P»5OMIM0 OnOAMZATION N*HE(S) AND AOOnEBS (EQ 




Goddard Space HigN Cenier 
Gieenbelt.Maiyimi 20771 





93B00038 
Code 902 


a SPONSOMNG / laOiaYOlHNQ A008ICV NAMEtS) AND ADDRESS (ES) 

Natunal Aeranauiks and Space Adnsnisoation 
Washington, DC 20S46000I 


VI 1 


MDO^HCtKEKm 


NASA CP-3198. Vol. I 


11. SUPPLBIGWrAIIY NOTB 


Kobler Goddard Space Flight Center. Greenbelt. MD; 
liariharan: STX Oxporation. Lanham, MD. 


i2bL DtsmaunoNoooE 


12a DOimUTION/ AVAiLASairV STATMBfr 

Unclassified - Unlimited 
Subjea Category 82 


IX AASTRaCT (kBjomum 200 menU} 

This report contains copies of nearly all of the technical papers and viewgraphs presented at the 
Goddard Conference on Mass Storage Systems and Technologies held in SeptennAber 1992. Siriilar to 
last year s conference, this year’s gathering served as an informational exchange forum for topit:s 
primarily relating to the ingestion and management of massive amounts of data and the anendant 
problems (data ingestion rates now approach the order of terabytes per day). Discussion topics include 
the IEEE Mass Storage System Reference Model, data archiving standards, high-performance storage 
devices, magnetic and magneto-opde storage systems, magnedc and o( deal recording technologies, 
high-performance helical scan recording systems, and kow enr* helical scan tape drives. Additional 
discussion topics addressed the evoludon of the idcndfiabic unit for processing purposes (file, granule, 
data set or some similar object) as data ingestion rates increase dramatically, ai'sd the prL.r.e nt state of 
the an in m^ss storage technology. 


14. SUBJECT TEnMS 


l5.NUIIB£aOFPAOeS 

356 


Magnedc tape, magnedc disk, optical disk, mass storage, software storage 


17 . SEOUfVTYClAB ZAVOH 1A SECURITY ClASSnCATION 19 . SEOURHY CLASORCATION UMITATKXl OF A8ST1UCT 

0 FR 9 ORT CFrKPAOE OFABSTKACT 

Unclassified Unclassified Unclassified 


Unclassified 


NSN 7540-01 280-5500 


Standard Form 296 (F jv 2-69) 
Ptmctbm br amsi Ad ZM it 

2fli*i02 





















